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Preface 


This book gives an overview of cutting-edge work on a new paradigm called the 
“sublinear computation paradigm,” which was proposed in the large multiyear 
academic research project “Foundations of Innovative Algorithms for Big Data” in 
Japan. In today's rapidly evolving age of big data, massive increases in big data 
have led to many new opportunities and uncharted areas of exploration, but have 
also brought new challenges. To handle the unprecedented explosion of big data 
sets in research, industry, and other areas of society, there is an urgent need to 
develop novel methods and approaches for big data analysis. To meet this need, we 
are pursuing innovative changes in algorithm theory for big data. For example, 
polynomial-time algorithms have thus far been regarded as “fast,” but if we apply 
an O(n’)-time algorithm to a petabyte-scale or larger big data set, we will encounter 
problems in terms of computational resources or running time. To deal with this 
critical computational and algorithmic bottleneck, we require linear, sublinear, and 
constant-time algorithms. In this project, which ran from October 2014 to 
September 2021, we have proposed the sublinear computation paradigm in order to 
support innovation in the big data era. We have created a foundation of innovative 
algorithms by developing computational procedures, data structures, and modeling 
techniques for big data. The project is organized into three teams that focus on 
sublinear algorithms, sublinear data structures, and sublinear modeling. Our work 
has provided high-level academic research results of strong computational and 
algorithmic interest, which are presented in this book. 

This book consists of five parts: Part I, which consists of a single chapter 
introducing the concept of the sublinear computation paradigm; Parts II, Ш, and IV 
review results on sublinear algorithms, sublinear data structures, and sublinear 
modeling, respectively; and Part V presents some application results. 
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Part I 
Introduction 


Chapter 1 A) 
What Is the Sublinear Computation get 
Paradigm? 


Naoki Katoh and Hiro Ito 


Abstract This chapter introduces the “sublinear computation paradigm.” A sublinear- 
time algorithm is an algorithm that runs in time sublinear in the size of the instance 
(input data). In other words, the running time is o(n), where n is the size of the 
instance. This century marks the start of the era of big data. In order to manage 
big data, polynomial-time algorithms, which are considered to be efficient, may 
sometimes be inadequate because they may require too much time or computational 
resources. In such cases, sublinear-time algorithms are expected to work well. We call 
this idea the “sublinear computation paradigm.” A research project named “Foun- 
dations on Innovative Algorithms for Big Data (ABD),” in which this paradigm is 
the central concept, was started under the CREST program of the Japan Science and 
Technology Agency (JST) in October 2014 and concluded in September 2021. This 
book mainly introduces the results of this project. 


11 We Are in the Era of Big Data 


The twenty-first century can be called the era of Big Data. The number of webpages 
on the Internet was estimated to be more than 1 trillion (2107?) in 2008 [22], and 
the number of websites grows ten times in these 10 years [21]. Thus the number of 
webpages is estimated to be more than 10 trillion (210?) now. If we assume that 
106 bytes(~ 107 bits) of data is contained in a single webpage on average,! then the 
total amount of the data stored on the Internet would be more than 100 exabits (21029 
bits)! The various actions that everyone performs are collected by our smartphones 
and are stored in the memory of storage devices around the world. The remarkable 
development of computer memory has made it possible to store this information. 


! Note that one 1080 x 1920 pixel digital photo consists of more than 2 x 10° pixels. 
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However, the ability to store data and the ability to make good use of the data are 
different problems. The speed of the data transfer using IEEE 802.1 lac is 6.9 Gbps. 
Using this, it would take 1.7 days to read 1 petabit (10'5 bit) of data. To read 1 exabit 
(10'8 bit) of data, we would need over 4 years! Although the speed of data transfer 
is expected to continue to increase, the amount of available data is also expected to 
grow even faster. 

This situation can create new problems that did not arise in past centuries, such 
as requiring a huge amount of time just to read an entire dataset. We are thus faced 
with new problems in terms of computation. 


1.3 Theory of Computational Complexity 
and Polynomial-Time Algorithms 


In the area of the theory of computational complexity, the term “polynomial-time 
algorithms” is often as a synonym for “efficient algorithms.” A polynomial-time 
algorithm is an algorithm that runs in time expressed by a function polynomial of 
the size of the instance (i.e., the input). For example, consider the sorting problem 
that takes a set of positive integers а,..., а, as input and outputs a permutation 
7 :(1,..., n] — {1,..., n) such that ал < алиф) for every i є {1,...,n — 1j. 
In this problem, the input is expressed by n integers and thus the input size is n. 

We now briefly introduce the theory of computational complexity. Theoretically, 
the computation time of an algorithms is expressed in terms of the number of basic 
units of calculations (i.e., the basic arithmetic operations, reading or writing a value in 
a cell in memory, and comparison of two values?). The complexity is then expressed 
as a function of n, say T (n), where n is the (data) size of the input. If there exists 
a fixed integer k such that T (n) = O (n^), then we say that the algorithm runs in 
polynomial time. 

For example, the sorting problem can be solved in O(nlogn) time, which is 
polynomial, and it has been proven that this is the minimum in the big-O sense, 
meaning that no algorithm exists that runs in o(n log n)-time. In contrast, for the 
partitioning problem, which is the problem of finding a subset B of a given set A 
consisting of n integers a, ..., a, such that 2 шев а = i X ед a;, no polynomial- 
time algorithms have been found and the majority of researchers believe that no such 
algorithm exists." 


? More rigorously, representing an integer a requires around log, a bits. However, in the area of the 
theory of computational complexity, we usually use the assumption that one integer is stored in one 
cell (byte) of the memory. Since this assumption may cause some strange results if pathologically 
huge integers are used, these integers are prohibited. 

3 Tn order to avoid excessive calculations, we assume that each integer consists of at most log; л 
bits, where n is the number of integers treated in the instance. 

^ This is equivalent to the well-known “P vs. NP problem,” which is one of the seven Millennium 
Prize open problems in mathematics. 
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For many problems, constructing ап exponential-time algorithm is easy. For the 
partitioning problem, for example, an algorithm that tests all subsets of A clearly 
solves the problem, and this requires 2” - O(n) time, which is exponential. Therefore, 
the existence of an exponential-time algorithm is considered to be trivial for many 
cases. Constructing polynomial-time algorithms, however, requires additional ideas 
in many cases. 


13 Polynomial-Time Algorithms and Sublinear-Time 
Algorithms 


1.3.1 A Brief History of Polynomial-Time Algorithms 


The idea that “polynomial-time algorithms are efficient" is sometimes called Cob- 
ham’s Thesis or Cobham—Edmonds’ Thesis, which is named after Alan Cobham and 
Jack Edmonds [4]. Cobham [3] identified tractable problems with the complexity 
class P, which is the class of problems solvable in polynomial-time with respect to 
the input size. Edmonds also stated the same thing in [7]. 

Although these papers were published in 1965, the idea behind this thesis seems 
to have been a commonly held belief among researchers in the late in 1950s. For 
example, Kruskal’s algorithm and Prim’s algorithms, which are both almost linear- 
time algorithms for the minimum spanning tree problem, were presented in 1956 
[16] and 1957 [17], respectively. Dijkstra’s algorithm, which is an almost linear-time 
algorithm for the shortest path problem with positive edge lengths, was presented 
in 1959 [6]. Ford and Fulkerson presented the augmenting path algorithm for the 
maximum flow problem in 1956 [8]. The blossom algorithm was proposed by Jack 
Edmonds in 1961 for the maximum matching problem on general (i.e., not necessarily 
bipartite) graphs [7]. 

In 1971, Cook proposed the idea of NP-completeness and proved that the satis- 
fiability problem (SAT) is NP-complete [5]. NP-complete problems are intuitively 
the most difficult problems among the class NP. NP is the set of problems that can 
be solved in polynomial-time by nondeterministic Turing machines. Although we 
do not have a proof yet, many researchers believe that no polynomial-time algo- 
rithms exist for any NP-complete problems.’ Cook's study created a new field of 
research through which countlessly many combinatorial problems have been found 
to be NP-complete [10]. 

By definition, it is trivial that every problem in NP can be solved in exponen- 
tial time (by a Turing machine). The theory of NP-completeness explicitly and 
firmly fixed the idea that “polynomial-time algorithms are efficient” in the minds of 
researchers. We would like to call this idea the polynomial computation paradigm. 


5 This is the “Р vs. NP problem,” which is one of the seven Millennium Prize open problems in 
mathematics at the end of the twntienth century. 
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Many important polynomial-time algorithms are now known, including the two 
basic polynomial-time algorithms for the linear programming problem (LP), namely 
the ellipsoid method proposed by Khachiyan in 1979 [15] and the interior-point 
method proposed by Karmarkar in 1984 [13], the strongly polynomial-time algorithm 
for the minimum cost flow problem proposed by Eva Tardos in 1985 [19], the linear- 
time shortest path algorithm with positive integer edge lengths proposed by Mikkel 
Thorup in 1997 [20], and the deterministic polynomial-time algorithm for primality 
test proposed by Agrawal, Kayal, and Saxena in 2002 [1]. These algorithms pioneered 
new perspectives in the field of algorithm research. They are gems that were found 
under the polynomial computation paradigm. 


1.3.2 Emergence of Sublinear-Time Algorithms 


Although linear-time algorithms have naturally considered the fastest, since intu- 
itively we basically have to read all the data when solving a problem, the new idea of 
"sublinear-time algorithms" emerged at the end of the twentieth century. Sublinear- 
time algorithms run by reading only a sublinear (i.e., о(п)) amount of data from the 
input. 

The most popular framework for sublinear-time algorithms is "property testing." 
This idea was first presented by Rubinfeld and Sudan [18] in 1996 (although it 
appeared even earlier at a conference version in 1992) in the context of program 
checking. In this paper, they introduced the ideas of “distance” between an instance 
(e.g., a function) and a property (e.g., linearity), and "e-farness." They also gave 
constant-time testers for some properties of functions. The first study giving the 
notion of constant-time testability of combinatorial (mainly graph) structures was 
given by Goldreich, Goldwasser, and Ron [11], which was present a conference in 
1995 (STOC’95). After the turn of the century, many studies that follow this idea of 
testability have appeared and the importance of this field is growing [2, 9]. 


1.3.3 Property Testing and Parameter Testing 


We say that a testing algorithm (or tester for short) for a property 7? accepts a given 
instance J with probability at least 2/3 if 7 has P and rejects it with probability 
at least 2/3 if I is far from having P. P is defined as a (generally infinitely large) 
subset of instances. The distance between / and ? is defined as the minimum Ham- 
ming distance between 7 and IJ’ € P. The distance is normalized to be in [0, 1] 
(1.е., dist(/, P) € [0, 1]). If an instance has the property, the distance is zero (1.е., 
45:07, P) = 0 if J € P). If dist(/, P) > є for an є є [0, 1], then we say that Z is 
€-far from P and otherwise e-close. A tester rejects 7 with probability at least 2/3 if 
I is €-far from P. 
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For a property, if a tester exists whose running time? is bounded by a constant 
independent of the size of the input, then we call the property is testable.’ This 
framework is called property testing. 

Property testing is a relaxation of the framework of decision problems. In contrast, 
a relaxation of the framework of optimization problems is parameter testing. In 
parameter testing, we try to find an approximation to the value of the objective 
function with an additive error of at most € N from the optimum value, where N is 
the maximum value of the objective function. 

This idea appeared at the end of the twentieth century, and was further developed 
in this century. See Chaps.2 and 3 for these themes. 


1.4 Ways to Decrease Computational Resources 


Inaddition to property and parameter testing, there are various methods for decreasing 
the amount of computational resources needed for handling big data. Although some 
methods may require linear computation, each of them has strong merits. We briefly 
introduce these methods in this section. 


1.4.1 Streaming Algorithms 


Property testing generally uses the assumption that an algorithm can read any position 
(cell) of the input. However, this may be difficult in some situations, such as if the 
data arrives as a stream (sequence) and the algorithm is required to read the values 
one by one in the order of arrival. The key assumption of this framework is that an 
algorithm does not have enough memory to store the entire input. For example, to 
find the maximum value in a sequence of integers aj, .. ., an, itis enough to use O(1) 
cells of memories.? 

Although this method requires linear computation time, since it must read all of 
the data, the amount of memory is constant in many cases. If we assume that the 
order of data arrival in the stream is random, then it becomes close to the setting of 
(nonadaptive?) property testing. In this book, streaming algorithms are covered in 
Chap. 16. 


6 Normally we also use the “query complexity" besides the running time. See Chap. 2 for details. 
7 Sometimes “testable” means that the problem has an algorithm with a sublinear query complexity, 
and strongly testable may be used for distinguishing constant query complexity from mere sublinear 
query complexity. 

8 We assume that each memory cell can store any one integer among (a1, ..., ап}. 

9 Nonadaptive means that the query (of an algorithm) cannot depend on any answer of the queries; 
in other words, the queries are fixed before the algorithm starts. 
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1.4.2 Compression 


Compression is a traditional and typical method for treating digital data. Basically, 
there are two types of compression: one type is compression of data without losing 
any information. In this type of compression, there is an information-theoretical lower 
bound on the data size. This method is used when the original data needs be recon- 
structed perfectly from the compressed data, and it thus called lossless compression 
or reversible compression. 'The other type of compression allows discarding of some 
of the data such that the compressed data is an inexact approximation. Although some 
of these algorithms can compress data drastically, it is not possible to reconstruct the 
original data perfectly from the compressed data, and these algorithms are thus called 
lossy compression or irreversible compression. This method works remarkably well 
in the field of music and image compression. See Chaps. 6, 7, 10, and 16 in this book 
for results from this area. 


1.4.3 Succinct Data Structures 


When compressed data is used, it essentially needs to be decompressed. However, 
decompression requires extra computation. It is therefore useful to be able to use 
compressed data as-is without decompression. Succinct data structures are a frame- 
work that realizes this idea. Specifically, succinct data structures use an amount of 
space that is close to the information-theoretical lower bound while still allowing 
efficient (fast) query operations. These structures involve a tradeoff between space 
and time. See Chaps. 8 and 9 for details. 


1.5 Need for the Sublinear Computation Paradigm 


1.5.1 Sublinear and Polynomial Computation Are Both 
Important 


Even though the sublinear computation paradigm has become necessary, it does not 
mean that the polynomial computation paradigm is obsolete. Polynomial computa- 
tion is still important in normal computations. The typical cases where the sublinear 
computations are needed are when we need to treat big data. In such cases, traditional 
polynomial computation is sometimes too slow. 

This relationship between the polynomial computation paradigm and the sub- 
linear computation paradigm is analogous to the relationship between Newtonian 
mechanics and the theory of relativity in physics. While Newton mechanics is used 
for normal physical calculations, the theory of relativity is needed if we try to calcu- 
late the motion of very fast objects such as rockets, satellites, or electrons. We entered 
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the era of the theory of relativity in the twentieth century and the era of sublinear 
computation era in the twenty-first century. 


1.5.2 Research Project ABD 


A research project named “Foundations on Innovative Algorithms for Big Data 
(ABD),”!° in which the sublinear computation paradigm is the central concept was 
started by JST, CREST, Japan in October 2014 and concluded in September 2021. 
The total budget was more than 300 million yen. Although the project had 24 mem- 
bers at its inception, many more researchers later joined and the final number of 
regular members exceeded 40 in total. The leader of the project was Prof. Naoki 
Katoh of University of Hyogo.!! The project consisted of three groups: the Sublin- 
ear Algorithm Group (Team A) led by Prof. Katoh; the Sublinear Data Structure 
Group (Team D) led by Prof. Tetsuo Shibuya of the University of Tokyo; and the 
Sublinear Modeling Group (Team M) led by Prof. Kazuyuki Tanaka of Tohoku Uni- 
versity. In this project, we worked on problems in big data computation. The main 
purpose of this book is to introduce the results of this project. A special issue of The 
Review of Socionetwork Strategies [14] is also available for this project. While some 
of the methods adopted in this project are not sublinear, we are confident that every 
piece of research concluded under the project is useful and will form the foundations 
of innovative algorithms for big data! 


1.5.3 The Organization of This Book 


This part of the book, Part I, has provided an introduction. Parts II, III, and IV present 
the theoretical results of Teams A, D, and M, respectively. Application results leading 
to scientific and technological innovation are compiled in Part V. 
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Part II 
Sublinear Algorithms 


Chapter 2 R) 
Property Testing on Graphs and Games ES 


Hiro Ito 


Abstract Constant-time algorithms are powerful tools, since they run by reading 
only a constant-sized part of each input. Property testing is the most popular research 
framework for constant-time algorithms. In property testing, an algorithm determines 
whether a given instance satisfies some predetermined property or is far from satis- 
fying the property with high probability by reading a constant-sized part of the input. 
A property is said to be testable if there is a constant-time testing algorithm for the 
property. This chapter covers property testing on graphs and games. The fields of 
graph algorithms and property testing are two of the main streams of research on 
discrete algorithms and computational complexity. In the section on graphs in this 
chapter, we present some important results, particularly on the characterization of 
testable graph properties. At the end of the section, we show results that we pub- 
lished in 2020 on a complete characterization (necessary and sufficient condition) of 
testable monotone or hereditary properties in the bounded-degree digraphs. In the 
section on games, we present results that we published in 2019 showing that the gen- 
eralized chess, Shogi (Japanese chess), and Xiangqi (Chinese chess) are all testable. 
We believe that this is the first results for testable EXPTIME-complete problems. 


21 Introduction 


The development of efficient algorithms for problems on big data problems is an 
urgent task. Constant-time algorithms are a powerful tool for this since they run by 
reading only a constant-sized part of each input. In other words, the running time 
is invariant regardless of the size of the input. Property testing is the most popular 
research framework for constant-time algorithms. In property testing, an algorithm 
determines whether a given instance satisfies some predetermined property or is far 
from satisfying that property with high probability by reading a constant-sized part 
of the input. This section presents some results mainly concerning property testing 
that have recently been obtained in the ABD Project. 
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2.2 Basic Terms and Definitions for Property Testing 


This section gives some of the basic terms that are needed in order to explain our 
results. Property testing works on many different types of models, including graphs, 
functions, strings, grammars, and images. Although the details of the definitions 
differ slightly between the different models, since the basic ideas are the same for 
all of models, we present only the definitions for digraphs. 

Let N = (0, 1, 2, ...} be the set of natural numbers. In this chapter, we sometimes 
omit floor or ceiling functions. For example, if we write s = y/n in a context where 
s must be an integer and n is not necessarily a square number, then y/n should be 
taken to mean | /n | or [vn]. This allows us to disregard integrality issues that make 
no real difference to any of our proofs. 


2.2.1 Graphs and the Three Models for Property Testing 


A directed graph or digraph G is defined as a pair of finite sets (V, E), where V is 
a finite set of vertices and E C V x V isa set of directed edges, or edges for short. 
The vertex set V and the edge set E of a graph G are sometimes written as Vg and 
Eg, respectively. If the direction of each edge is ignored (1.е., (u, v) = (v, и) for any 
u,v € V), then the digraph is called a graph (or an undirected graph if we want to 
indicate undirectedness explicitly). Every graph can be represented as a digraph by 
using reflectivity on edges; in other words if (u, v) € E, then (v, u) € E for every 
u,v € V. Thus, graphs can be regarded as special cases of digraphs. This section 
mainly treats (undirected) graphs. Digraphs are considered in Sect. 2.4. Many of the 
terms and symbols we define for graphs are also used for digraphs. 

The order of a graph С is given by |Vc| and the size of a graph С is given by 
| Eg |. A graph (resp., digraph) of order п is also called an n-graph (resp., n-digraph). 
The number of vertices adjacent to a vertex v in a graph G is denoted by deg, (v). 
If G is clear from the context, the subscript G may be omitted. In property testing, 
since an algorithm reads only a part of an instance (input), it gets information about 
the instances through oracles, which depend on how to the graphs are represented. 
There are three known models for treating graphs in property testing: the dense-graph 
model; the bounded-degree (graph) model; and the general-graph model. 

In the dense-graph model, the edge oracle is used: If an algorithm queries whether 
(u, v) € E or not, the oracle answers correctly: the answer is 1 if (u, v) € E and 
0 otherwise. This model basically treats dense (i.e., |E| = © (п2)) graphs. This is 
because if |E| = o(n?), then the edge oracle answers “0” almost every time when n 
is large, making the queries useless. ! 

In the bounded-degree model, there is a restriction such that the degree of every 
vertex is bounded by a predetermined integer d > 1, that is, deg(v) < d (Yv € V). 
From this restriction, it follows that the number of edges in a graph is at most dn/2 (or 


1 Tt works only for determining whether a given graph is sparse. 
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dn for a digraph); in other words, the graph is sparse (note that d is a constant). This 
model assumes that for every vertex v, the vertices adjacent to v are ordered. This 
model uses the adjacent-vertex oracle: If an algorithm queries for the ith (1 < i < d) 
adjacent vertex of v by giving a pair (v, i), the oracle answers the name (ID) of the 
vertex if exists and returns a predetermined special symbol such as L otherwise. A 
graph where the degree is bounded by d is also called a d-bounded-degree graph. 

The general-graph model is a mixed model of the dense-graph model and the 
bounded-degree model. Although this model does not have any maximum degree- 
bound, there is a fixed upper bound d on the average degree. In many cases d is a 
constant and the graphs in this model are sparse. However, if d = O (n), graphs in 
the model may be dense. This model allows all oracles that are allowed in the other 
two models in addition to the degree oracle: If an algorithm queries the degree of a 
vertex v, it replies with the correct answer deg(v). 


2.2.2 Properties, Distances, and Testers 


The set of graphs considered in each model—that is, the dense-graph model, the 
bounded-degree model, or the general-graph model—is denoted by Г. The subset of 
Г such that the order of the graph is n is denoted by Г„. Hence Г = ( J, Г». 

A property is defined as a (generally infinitely large) subset of graphs closed under 
isomorphism.” For example “planarity” is defined as the set of all planar graphs. For 
a property P, we define Ф, as P N Г„. Thus, clearly P = (J en Pn- 

Property testing is a relaxation of a decision problem. The object of a property 
testing is to distinguish with high probability whether a given instance satisfies some 
predetermined property or the instance is "far" from satisfying the property. This 
requires a mathematical definition of “far.” 

Let G and С’ both be n-graphs; G, G” € Г„. The distance between the two graphs 
is defined as the Hamming distance between them divided by the largest Ham- 
ming distance in the model (for normalization). Thus, the distance depends on the 
models (i.e., how the graphs are represented). We explain this by using the dense- 
graph model. Let dg, : V x V — (0, 1} be the characteristic function on Ec, that 
is, dg, (u, v) = 1 if (u, у) € Eg and 0 otherwise. The distance between G and G” 
is defined as follows: We denote by m(G, G^) the number of edges that need to be 
deleted from and/or inserted into G in order to make G = G’, i.e. 


m(G, G^) := |{(u,v) € V x V | be, (и, v) = ёк, (u, v)}| 


Using this, we define the distance between G and G’ as follows?: 


? Intuitively this means to ignore the labels on vertices and edges. 


3 Although the maximum number of edges in any (undirected) graph of order n is n(n — 1)/2, we 
use n? for the denominator for simplicity. 
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G, G’ 
dist(G, G^) := = (2.1) 
n 
Note that 0 < dist(G, G’) < 1 for every G and G’. In the bounded-degree model and 


the general-graph model, the distance is defined as follows*: 


m(G, G’) 


dist(G, С) = —7 
n 


(2.2) 
where d is the upper bound on the maximum (resp., the average) vertex-degrees for 
the bounded-degree model (resp., the general-graph model). 

By using the distance between graphs, the distance beetween a graph С € Г, and 
a property F is defined as follows: 


dist(G, P) := otherwise. 


ming <p, dist(G, С) if Pa #0, 

Es 

This applies to all the models. For a real value 0 « € < 1, we say that G is e-far 
from G’ (resp., P) if dist(G, С”) > e (resp., dist(G, P) > €) and e-close otherwise. 

A testing algorithm for a property f? is an algorithm that, given query access (by the 
oracles) to an instance G and given 0 < є < 1, accepts every С € P with probability 
at least 2/3, and rejects every G that is e-far from P with probability at least 2/3. 
If a testing algorithm accepts every G € P with probability 1, then the algorithm is 
called a one-sided-error. The number of queries made by an algorithm to the given 
oracle is called the query complexity of the algorithm. If the query complexity of a 
testing algorithm is bounded by a constant that is independent of n (but that may 
depend on € and d), then the algorithm is called a tester. A property is testable? if 
there is a tester for the property. 


2.3 Important Known Results in Property Testing 
on Graphs 


This section gives a very brief overview of important known results in property testing 
on graphs, particularly on the characterization and general properties of testability. 
See a recent review [11] or books [4, 8] for details. 


^ Although the maximum number of edges of any d-bounded-degree (undirected) graph of order n 
is dn/2, we use dn for the denominator for simplicity. 

5 Sometimes “testable” means that the problem has an algorithm with sublinear query complexity, 
and strongly testable may be used to distinguish constant query complexity from mere sublinear 
query complexity. 
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2.3.1 Results for the Dense-Graph Model 


Alon et al. [2] found a combinatorial characterization (necessary and sufficient con- 
dition) of testable properties for the dense-graph model. We first present the theorem 
without defining the terms used in it. 


Theorem 2.1 For the dense-graph model, a graph property is testable if and only if 
it is regular-reducible. 


This theorem utilize the extremely powerful monumental Szeméredi’s regularity 
lemma, which we now introduce briefly. For a pair of subsets of vertices A, B C V 
of a graph С = (V, E), den(A, B) := ETT is called the density of the pair. A 
family of subsets V = (Vi,..., Vi} (Vi € V, Vi € {1,..., kj) is called a partition 
of Vif V; O Vj = Ø forall 1 <i < j<kandV=V,U---U\.A partition У = 
{Vi,..., Vi} of the vertex set of a graph is called an equipartition if |V;| and |V;| 
differ by no more than 1 forall 1 <i < j < k. 


Definition 2.1 (e-regular) Let 0 < є <1 be a real number апа A, BC V. A 
pair (A, B) is called e-regular if |den(A, B) — den(X, Y)| < € for any two sub- 
sets X C A and Y C B satisfying |X| > €|A| and |Y| > €|B|. An equipartition 
V = (Vi, ..., Vx) of the vertex set of a graph is called e-regular if all but at most 
ek? of the pairs (Vj, Vj) (i, j € {1,..., k]) are e-regular. 


Definition 2.2 (regularity-instance) A regularity-instance R is given by an error- 
parameter 0 « € < l,aninteger К, a set of (5) real numbers 0 < 5j,; < 1 indexed by 
1 <i < j <k,andaset R of pairs (i, j) of size at most ek”. A graph is said to satisfy 
the regularity-instance if it has an equipartition V = (ү, ..., Ук} such that for all 
(i, j) € R the pair (V;, Vj) is e-regular and satisfies |E(V;, V;)| = т: У; ПУ. The 
complexity of the regularity instance is max(k, 1/є). 


Definition 2.3 (regular-reducible) A graph property P is regular-reducible if for 
any ô > O there exists r = rp (ô) such that for any п there is a family R of at most r 
regularity-instances each of complexity at most r, such that the following holds for 
every € > 0 and every n-graph G: 


1. If G ЄР, then for some R e R, С is ó-close to R. 
2. If С is e-far from P, then for any R € R, G is (є — 5)-far from R. 


Theorem 2.2 (Szeméredi's regularity lemma [2, 17]) For every pair of an integer 
t and a real number є > 0 there exists an integer T = Т,(1, €) such that any graph 
with n > T vertices has an €-regular equipartition of order k, where t <k <T. 


An intuitive explanation of the regularity lemma is that, for any є > 0, every 
graph G = (V, E) has an e-approximation of a constant-sized edge-weighted graph, 
where the edge weight approximates the density of the corresponding vertex pair. 
Intuitively, a property being regular-reducible means that it can be represented by a 
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constant number of equipartitions based on the regularity lemma; in other words, the 
regularity lemma holds for testing the property. See [11] also for details. 

Representative regular-reducible properties are monotone or hereditary proper- 
ties, which are defined as follows. 


Definition 2.4 A graph property P is monotone if for every G € P and e € Eg, 
G — (e) ЄР. A graph property P is hereditary if for every G € P and v € Vg, 
G—{v} e. 


Planarity, bipartiteness, k-colorability (for any k є №), Н -freeness (for any graph 
Н), and disconnectedness are all monotone. The former four properties are also 
hereditary, but the last one, disconnectedness, is not.’ A well-known non-monotone 
and hereditary property is perfectness: A graph is said to be perfect if for every 
induced subgraph, the chromatic number of the subgraph equals the clique number 
(= the order of the largest clique) of the subgraph. Every monotone or hereditary 
property is regular-reducible (see [2] for details). 

We can say that Theorem 2.1 solves the problem of characterizing testable prop- 
erties in the dense-graph model in a sense. However, the constants that appear in the 
algorithms obtained by Theorem 2.1 are incredibly (maybe more than astronomi- 
cally) huge! Thus, developing faster (1.е., smaller constant complexity) algorithms 
remains an issue for each problem. 


2.3.2 Results for the Bounded-Degree Model 


Whereas the combinatorial characterization of testable properties as shown in 
Theorem 2.1 was obtained for the dense-graph model, no perfect results have been 
obtained for the bounded-degree model despite many attempts to achieve this goal. 
However, progress is being made in steps. We now have an important characteriza- 
tion of testable properties in the bounded-degree model called “hyperfiniteness.” We 
also found another characterization called “forbidden configurations,” for one-sided 
error testability, which is explained in Sect. 2.4. 


Definition 2.5 Lete > 0,t > 0,andd > 0. Let С = (V, E) be ad-bounded-degree 
n-graph. If one can remove at most edn edges from G such that each connected 
component of the resulting graph has at most f vertices, then С is called (e, t)- 
hyperfinite (with respect to degree bound d). For a function p : Rt > К, if С is 
(є, e(€))-hyperfinite for every є > 0, then С is called p-hyperfinite. A set G of d- 
degree-bounded graphs is called p-hyperfinite if VG € G is p-hyperfinite. G is called 
hyperfinite if there is a function p such that G is p-hyperfinite. 


Newman and Sohler [15] presented the following theorem. 


6 If a graph includes по Н as a subgraph, then it is called H-free. 


71f a graph consisting of one connected component of order n — 1 and one isolated vertex is 
disconnected, removing the isolated vertex from the graph makes it connected. 
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Theorem 2.3 /n the bounded-degree model, every graph property is testable for any 
hyperfinite family of graphs. 


While this is a sufficient condition, the following necessary condition related to 
hyperfiniteness was obtained by Fichtenberger et al. [5]. 


Definition 2.6 A subproperty of a property P is a property that is a subset of P. A 
property is non-trivially testable if it is testable and there exists є > 0 such that there 
is an infinite number of graphs that are є-Їаг from the property. 


Theorem 2.4 Every testable property of bounded-degree graphs is either finite or 
contains an infinite hyperfinite subproperty. Furthermore, the complement of every 
non-trivially testable graph property contains an infinite hyperfinite subproperty. 


These theorems show that there is a deep relation between hyperfiniteness and 
testability on bounded-degree graphs. We have found, however, no necessary and 
sufficient condition of graph testability even for a one-sided error. Recently we found 
necessary and sufficient conditions for one-sided-error testability on subclasses of 
properties of digraphs? [12]. This was obtained through the ABD Project, and is 
explained in Sect. 2.4. 


2.3.3 Results for the General-Graph Model 


There were previously no general classes of testable properties for the general-graph 
model. Through the ABD Project, a class that models complex networks called 
Hierarchical Scale Free (HSF) was founded that is testable. We present an outline 
of the result below, and the details are available in [10, 11]. 


Definition 2.7 For positive real numbers с > 0 and y > 1, a class of scale-free 
(multi)graphs SF (c, у) consists of (multi)graphs G = (V, E) for which the follow- 
ing condition holds: Let vj be the number of vertices v of degree i. Then: 


у; < спі”, Vi € {2,3,...,}. (2.3) 


A clique is a subgraph in which there exists an edge between every pair of vertices. 
For a nonnegative integer c > 0, a c-isolated clique is a clique such that the number 
of outgoing edges (edges between the clique and the other vertices) is less than ck, 
where К is the number of vertices of the clique. A 1-isolated clique is sometimes 
simply called an isolated clique (see [9] for details). Let 6(G) be the graph obtained 
from G by contracting all isolated cliques.? 


8 Note that any undirected graph can be represented by a digraph, i.e., the set of digraphs can be 
regarded as including the set of undirected graphs. 

? Two distinct isolated cliques never overlap, except in the special case of double-isolated-cliques, 
which consists of two isolated cliques of size k that share k — | vertices. A double-isolated-clique 
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Definition 2.8 For positive real numbers c > 0, у > 1 anda positive integer no > 1, 
a class of hierarchical scale-free (multi)graphs HSF = HSF (c, y, no) consists of 
(multi)graphs С = (V, E) for which the following conditions hold: 


G) Ge 87 (с, у), 

(ii) Consider the infinite sequence of graphs Go = С, С = &(Go), G2 = €(G1), 
..  If|Vg,| = no, then G; includes at least one isolated clique О С V with|Q| > 
2. (Note that if С has no such isolated clique, then С; = Сц = Со = +++.) 


For a graph G and a nonnegative integer d > 0, G|d is the graph obtained by 
deleting all edges incident to each vertex у of degree more than d. Note that G|d is 
a d-bounded-degree graph. The following properties were obtained by [10]. 


Lemma 2.1 For every SF = SF (c, y) with y > 2, and every positive real number 
€ > 0, there exists a constant ё = ó(e, c, y) such that for every graph G € SF, G|ó 
is €-close to G. 


This lemma looks useful since it means that for any є > 0, any scale-free graph 
is e-close to a bounded-degree graph. This lemma is applied in the proof of the 
following theorem, which is the main theorem of [10]. 


Theorem 2.5 Every property is testable for HSF (c, y, no) with y > 2. 


In the general-graph model, no other universal (constant-time) tester is known, 
but universal testing algorithms with polylog()-time query complexity have been 
found for forests [14] and outerplanar graphs [3]. 


2.4 Characterization of Testability on Bounded-Degree 
Digraphs 


2.4.1 Bounded-Degree Model of Digraphs 


As mentioned previously, there is no complete characterization of testable graph 
properties in bounded-degree graphs even for one-sided-errors. Through the ABD 
project, however, we have obtained a characterization for one-sided-error testable 
properties of monotone and hereditary properties of bounded-degree digraphs [12], 
which we briefly explain in this section. The set of digraphs can be regarded to include 
the set of undirected graphs by introducing reflexivity, i.e., Vu, v € V, if (u,v) є E, 
then (v, u) € E. 

In this section, we consider the bounded-degree model on digraphs. For a digraph 
G = (V, E) and a vertex v € V we denote by Nó (v) the set of outgoing neighbours 


Q has no edge between О and the other part of the graph (i.e., deg; (О) = 0), and thus we specially 
define that a double-isolated-clique in С is contracted into a vertex in &(G). Under this assumption, 
&(G) is uniquely defined. 
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of v, i.e., N$ Qv) := {u € V | (v, u) € E}. Similarly, Ng (v) := {u € V | (u, v) € Ej 
and Ng (v) := Nó (v) U Ng (v). The out-degree of v is Чер (v) := ING (у), and the 
in-degree of v is deg; (v) :— | Ng (v)|. The subscript С can be omitted if it is clear. 

For a (di)graph G = (V, E) and F C E, we denote by G — F the graph (V, E — 
F). For a (di)graph G — (V, E) and W C V, we denote by G[W] the subgraph of 
G induced Бу W (1.е., СТИ] contains all edges in Eg whose both endpoints аге in 
W). G[V — W]can be denoted by G — W. 

In the bounded-degree model for digraphs, there are two submodels: In one, 
only the out-degree is bounded; in the other, both the in-degree and out-degree are 
bounded. ? The former case is represented by F (d) model and the latter one by F B(d) 
model.'! The F(d) model is clealy wider than the F B(d) model. Moreover, every 
undirected d-bounded graph сап be formulated by the F B(d) model by replacing 
each undirected edge by a pair of anti-parallel directed edges. That is, the FB (d) 
model (and thus the F (d) model as well) is regarded as including the undirected 
d-bounded degree model. 


2.4.2 Monotone Properties and Hereditary Properties 


This section extends the monotone and hereditary properties that were defined in 
Definition 2.4 to digraphs. 

We first introduce the following notation for characterizing the testability of these 
properties. Let H be a set of digraphs. We call H an r-set if every member H € H 
has at most r vertices (i.e., Н is an r'-digraph for some г” < r). A digraph С is H- 
free if for every H € H, G contains no subgraph that is isomorphic to H. A digraph 
G is induced H-free if for every H € H, G contains no induced subgraph that is 
isomorphic to Н. We denote by Py (resp., 77) the property that contains all digraphs 
that аге 71-free (resp., induced H-free). Рн „ (resp., Ри) is the subproperty of Pu 
that consists of all n-digraphs in Py (resp., Р). We can easily confirm that Py, is 
monotone and P}; is hereditary for any H. 

Let Н = (V, E) be a digraph. For a subset W C V, if by disregarding the direc- 
tions of the edges of Н, W induces a connected component in the resulting undirected 
graph, then we say that H [W], which is the directed subgraph of Н induced by W, is 
a component of H. A digraph H is rooted if every component H' of H has a vertex 
v such that for every u € Vg: there exists a dipath (— directed path) from v to u. 


10 Clearly the case in which only the in-degree is bounded can be formulated by the model in which 
only the out-degree is bounded by changing the edge direction. 


!! F and B mean forward and backward, respectively. 
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2.4.8 Characterizations 


By using these terms, the characterizations of testable monotone or hereditary prop- 
erties for the F (d) model were given in [12]. 


Theorem 2.6 Let P = U,,<x Pn be a monotone property in the F (d)-model. Then P 
is testable if and only if there is a function ғ : (0, 1) — N such that forany0 < є < 1 
and n € N, there is an r(e)-set of rooted digraphs H,, such that the property Р:н, п 
satisfies the following two conditions: 

(a) Р, с PH,.n 

(b) P4, п is €/2-close to Pa. 


Theorem 2.7 Let? be a hereditary property in the F (d)-model. Then P is testable 
апа only if there are functions ғ : (0, 1) > Nand N : (0, 1) > N such that for any 
0 « e < 1, there is an r (€)-set of rooted digraphs H such that for every n > N (€), 
P.n satisfies the following two conditions: 

(a) P, СРЗ, 

(b) Фу „ is €/2-close to Py. 


Condition (b) in both Theorems 2.6 and 2.7 is necessary, since there exists a 
monotone and hereditary property that is testable with a one-sided-error and has no 
Hn such that |71, | is bounded by a constant (r (e)) and “Pp = PH, n Ph = Ph, PE 
One of these properties is Pe „(= РС) on the F(1)-model,!? where C; is the set 
of directed cycles (or dicycles, for short) of length in [3, k], i.e., Pc_,, is the property 
of having no dicycle of length in [3, vn]. This property is clearly monotone and 
hereditary. To express Pc „ by using a set H of forbidden subgraphs (or forbidden 
induced subgraphs), H must includes Суу, and thus [H| cannot be bounded by any 
constant. However, this property is testable with a one-sided-error as shown below. 


Lemma 2.2 Pc , on the F(1)-model is one-sided-error testable with query com- 
plexity О(є7?). 


To prove this lemma, we will use the following lemma, which is often effective 
for estimating the query complexity of testers. 


Lemma 2.3 For any real number x, the following inequality holds: 
е -x-4l. (2.4) 


The proof of this lemma is trivial from the differentiation of e* — (x + 1), and is 
omitted here. 


Proof of Lemma 2.2: If n < 2/e, then we can get the complete data of the graph in 
time 2/e. Thus it is enough to consider the case of п > 2/e. We use the following 
algorithm for the tester: 


12 Фс к # РО on the F (d)-model for d > 2. 
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Choose 5 = 2/e vertices vi, .. ., v; from V uniformly at random, and denote them 
by S. For each v; € S, check whether there is a dicycle of length at most s that 
includes v; by following each outgoing edge successively whenever it exists. (Note 
that in the F(1)-model, the outgoing edge of each vertex exists uniquely if it exists.) 
If a dicycle of length in [3, s] is found, then it is rejected; otherwise, it is accepted. 

We show that the above algorithm is the desired one-sided-error tester. It is clearly 
a one-sided-error, since it never rejects without finding a short (1.е., length of at most 
5) dicycle. Thus, it is enough to show that the algorithm rejects with probability at 
least 2/3 if the input is e-far from Pc „. 

Assume that the input G = (V, E) is e-far from Pe UE i.e., that G contains more 
than єл dicycles of length in [3, ./n]. Let C be the set of such dicycles. We divide C 
into the following two sets: 

Cshort = (C € C | the length of C is at most s.] 

Сов = (C € C | the length of C is more than s (and at most J/n).) 

From |C| > en, |Csnos| > €n/2 or |Ciong| > €n/2 holds. 

First, we assume that |Ciong| > €n/2. Clearly no pair of dicycles in C shares а 
common vertex, and thus more than esn/2 = n vertices are included in the graph 
contradiction. Thus, |Ciong| < €n/2. 

From this, it follows that |Cshor|  €n/2. Since no pair of dicycles in C shares а 
common vertex and each dicycle has at least three vertices, then the dicycles in Csnor 
contain more than 3en/2 vertices. Let W be the set of such vertices. If the algorithm 
finds at least one vertex from W, then it will find a short dicycle that includes the 
vertex and rejects the input. From |W| > 3en/2, it follows that the probability that 
a chosen vertex is not in W is less than 1 — 3є/2. Thus, the probability that all of s 
vertices chosen by the algorithm are not in W is less than 


(Í—36:/2)* =e" ug? < - 
Note that the first inequality above uses the inequality (2.4). The probability that the 
algorithm finds at least one vertex from W is, therefore, more than 2/3. The query 
complexity of this tester is clearly O(€~7). 
Since Pc „ is both monotone and hereditary, Theorems 2.6 and 2.7 hold. If we 
apply Theorem 2.6, then Hy, = Cmin(2/e, y/n) for each n. If we apply Theorem 2.7, then 
N(e) = 4/62 and H = C5,.. From this discussion, we observe that N (€) is essential 
in Theorem 2.7. 


2.4.4 An Idea to Extend the Characterizations Beyond 
Monotone and Hereditary 


We would like to extend Theorems 2.6 and 2.7 to general properties. We denote 
by Paegt(a—1) the property consisting of digraphs having no vertex with out-degree 
d — 1 on the F (d)-model. f?3,,(41) is one-sided-error testable as shown below. 
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Let G = (V, E) be an input. The algorithm for Фат) chooses 2/e vertices 
from V uniformly at random and checks their out-degrees. If it finds a vertex of 
degree d — 1, then it is rejected; otherwise, it is accepted. This algorithm is a one- 
sided-error, since it never rejects if there is no vertex of out-degree d — 1. If G is e-far 
from P deg (g—1), then there are more than en vertices of out-degrees d — 1. Thus, the 
probability that there is no vertex of out-degree d — 1 in the selected 2/e vertices by 
the algorithm is less than 


2 
€ 


0-9: < (76) =e? < 


w| = 


Note that this also uses the inequality (2.4). 

Hence, the above algorithm is a one-sided-error tester for Paegt(a—-1). However, 
expressing this property by using forbidden subgraphs or forbidden induced sub- 
graphs like Theorems 2.6 or 2.7 is impossible. 1? 

To extend the idea of "forbidden something" to non-monotone and non-hereditary 
properties, we [12] introduced the idea of “configurations,” by generalizing subgraphs 
and induced subgraphs. A similar idea has also appeared in [16]. 


Definition 2.9 A configuration is a pair О = (H, L), where H = (W, F) is a 
digraph in the F(d)-model, L : W — (developed, frontier} is a function, and the 
out-degree of every frontier vertex is 0. The configuration is rooted if H is rooted. 


Definition 2.10 Let O = (H = (W, F), L) and G = (V, E) be a configuration and 
a graph respectively in the F(d)-model. We say that G has an O-appearance if 
there is an injective mapping ¢ : W — V satisfying the condition that Vv є W with 
L(v) = developed, the following two conditions hold: 


G) Vu e W, (у, и) € F if and only if (6(v), $ (u)) € E. 
(1) If($(v), x) e E, then du € W, ф(и) = x. 


We say that G is O-free if G has no O-appearance. For a set O of configurations, we 
say that G is O-free if VO € O, G is O-free. 


As we have already stated, Pycg+(g_1) cannot be defined by any set of 
forbidden subgraphs or induced subgraphs. However, it can be defined by using 
O-freeness. That is, let Ogeg+(a_1) = (Н = (W, F), L) be a configuration such 
that W = (vo, vi, ...,va-1, E = f(vo vı), (Vo, v2)... (Vo, Va-D}, L(vo) = 
developed, and L (ур) = L(v2) = --- = L(va-1) = frontier. Then Тага) is defined 
by the set of Oqcs(4.1-free graphs. 

The idea of configuration-free (or forbidden configurations) may work for char- 
acterizing general one-sided-error testable properties on the F (d)-model. See [12] 
for details. 


15 This follows from the fact that Pdeg+(d—1) 1$ neither monotone nor hereditary, and from Theo- 
rems 2.6, and 2.7. 
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2.5 Testable EXPTIME-Complete Games 


This section presents results on the testability of combinatorial games, particularly 
the generalized chess, Shogi (Japanese chess), and Xiangqi (Chinese chess). Given 
any position on a y/n x 4/n board with O(n) pieces, the generalized chess, Shogi, 
and Xianggi problems are the problems of determining the property that “the player 
who moves first has a winning strategy." These problems are known or believed to be 
EXPTIME-complete [1, 6, 7]. In [13], we proposed that this property is testable for 
chess, Shogi, and Xiangqi. The Shogi tester and Xianggi tester are one-sided-error 
testers, and surprisingly, the chess tester is a no-error tester. Many problems have 
been revealed to be testable, but most of such problems belong to class NP. We think 
that this is the first result on the constant-time testability of EXPTIME-complete 
problems. This section presents these results. We mainly focus on chess, followed 
by Shogi, but omit the explanation for Xiangqi since the method is similar to the one 
for Shogi. See [13] for details. 


2.5.1 Definitions 


We begin by focusing mainly on generalized chess. Generalized chess is played on a 
мп x уп board with O(n) pieces, including two kings. White moves first and black 
plays after white. A position is defined by fixing each piece to a particular cell on 
the board. At any given position S, the problem is to determine whether white wins 
if both players play optimally. The basic rules are the same as those in the original 
chess and are omitted here. 

In chess, there are six different types of pieces: king (K), queen (Q), bishop (B), 
knight (N), rook (R), and pawn (P). There are only two pieces of kings; one white and 
one black. For each of the other piece-types (i.e., bishop, knight, rook, and pawn), 
there exist at most cn pieces for both white and black, respectively, where c is a 
constant. Piece-numbers from 1 to cn are given to each white or black piece of each 
piece-type; in other words, each piece has its own piece ID (k, o, £) comprising a 
piece-type k € (K, Q, B, N, R, P}, an owner-color o € (white, black}, and a piece- 
number £ € {1,..., cn}. 

An algorithm can find the given position through the following oracles. 


Piece oracle: Given a piece ID (k, o, £), the piece oracle answers an ordered pair 
(i, j) that provides the cell (i, j), i, j € (0, 1,..., /n) where it lies. (i and j 
represent the column number and row number, respectively, and if i = j = 0, it 
denotes that the piece is not in the game (such a piece is called an unused piece). 
This oracle is expressed as qı (К, o, £) = (i, j). 

Coordinate oracle: Given a coordinate (i, j), i, j € (0, 1,..., ./n}, the coordinate 
oracle answers the piece ID (k, o, £) of the piece that lies on the cell if one exists. If 
no piece lies on the cell, the oracle answers k — o — £ — 0. Thisoracle is expressed 
as qoi, j) = (k, o, £). 
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When we explicitly identify position S, we express the oracles as qi(k, o, £; S) 
and q»(i, j; S), respectively. We introduce the assumption that all pieces can be 
arranged on the board simultaneously, and it thus follows that 2 x (5cn + 1) <n. 
For simplicity, we assume that 

c x 1/11. (2.5) 


A position S is called a winner if white has a winning strategy (1.е., white will win if 
the players start from S and play optimally) and a loser otherwise. Note that a loser 
not only includes cases where white loses but also where the game ends in a draw. 
A position is fixed by querying the piece oracle for every piece. The number of 
different queries for the piece oracle is at most n, and thus a position is fixed by the 
maximum of n data. From this, we define the distance between positions $ and S” as 


KG, j) | aa G. j; S) 5 qa, j; SYY 


n 


dist(S, S’) :— 


(2.6) 


Clearly 0 < dist(S, S") < 1. 

Positions $ and S’ are called isomorphic if we can make S identical to S’ by only 
changing their piece-numbers (neither changing the piece-type nor owner-color is 
allowed). A set of positions that is closed under isomorphism is called a property. 
The distance between a position S and a property P is defined as follows: 


dist(S, P) := min dist(S, S^). (2.7) 
SEP 


For a positive є > 0, S is e-far from Р if dist(S,P) > є; otherwise, it is e-close. Let 
W be the set of winners. W is clearly closed under isomorphism and thus W 15 a 
property. 

For generalized Shogi and Xiangqi, similar definitions are used. They can be 
easily deduced and are omitted here. See [13] for details. 


2.5.2 Testers for Generalized Chess, Shogi, and Xiangqi 


The following theorem was presented in [13]. Note that a no-error tester is a one- 
sided-error tester that always rejects every input that is e-far from the property; that 
is, it always accepts or rejects with no-error if the input is in the property or e-far 
from the property. 


Theorem 2.8 There exists a no-error tester with query complexity O (e^) for the 
generalized chess problem, there exists a one-sided-error tester with query complexity 
O (c^?) for the generalized Shogi problem, and there exists a one-sided-error tester 
with query complexity O (e^!) for the generalized Xiangqi problem. 
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Fig. 2.1 The black king will 1 2 3 4 5 
be checkmated by white's 
next move, as indicated by 
the arrow 


Proof of the chess part of Theorem 2.8 Let S be a given position. Let S’ be the 
position made from S by changing the pieces in cells (i, j), i € (1, 2, 3, 4} and 
j € {1, 2, 3, 4, 5}, as shown in Fig.2.1. 

The pieces that were in these cells in $ are changed to be unused pieces, and 
the pieces that appear in these cells in S’ are moved from other cells or unused 
pieces. In S’, the white king is safe and the black king will be checkmated by white’s 
next move (moving the queen from (3, 2) to (2, 2)), meaning that S’ is a winner. 
The distance between S and S’ is at most 20+ 8 = 28. Thus, if n > 28/e, then 
dist(S, S") < 28/n < є. Hence, S is e-close to W, and it is sufficient to accept it. If 
n « 28/e, itis sufficient to read all of the information by calling the piece oracle for 
all pieces, which requires O (e ^!) queries. 

This algorithm always accepts a winner. Moreover, if a given position S is e-far 
from W, then n < 28/e and the algorithm knows the complete information for S. 
Therefore, this algorithm is no-error. 


The algorithms for the generalized Shogi and Xiangqi problems are a little more 
complicated. The reason is that in Shogi and Xiangqi there are fouls based on posi- 
tions. A player who plays the fouls loses. In Shogi, the following fouls need to be 
considered in the generalized Shogi problem. 


e Nifu (double pawn): two or more unpromoted!^ pawns that belong to the same 
player must not be in the same column simultaneously. 

e Dead end: pawns, lances, and knights!> can never be moved or dropped onto cells 
from which a subsequent move cannot be made. Therefore, white (resp., black) 
unpromoted pawns and lances can never be in the first (resp., ./nth) row, and 
white (resp., black) knights can never be in the first or second (resp., ./nth or 
(Vn — 1)th) rows. 


14 [f a piece of some piece-type can be promoted (to a stronger piece) when it enters the opponent’s 
camp. 
15 These three pieces can move only forward. 
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In a given position S, if there is a white piece that plays a fault, then white cannot 
win,! and thus the position is not a winner. However, if the number of pieces related to 
fouls is small (e.g., smaller than єл /2), we can remove the fouls and make white win, 
i.e., the position is e-close to W. To detect this, we need to perform preprocessing 
and the tester may have error when the input is e-far from W. For Xiangqi, a similar 
discussion applies. See [13] for details. 


2.6 Summary 


In this chapter, we introduced basic terminology and important results for property 
testing, which is the most examined framework for constant- or sublinear-time algo- 
rithms. In particular we presented two of our resent esults: The first is the complete 
characterization of one-sided-error testable monotone or hereditary properties on 
bounded-out-degree digraphs, and the other one is the testers for the generalized 
chess, Shogi, and Xiangqi problems, which are all EXPTIME-complete. 

The 21st century can be called the era of big data, and the larger big data becomes, 
the more we need sublinear- and constant-time algorithms. The importance of this 
area will continue to grow. The number of fields in which constant-algorithms are 
efficiently applied will increase, and new techniques will be found accordingly. We 
eagerly await these developments. 
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Chapter 3 A) 
Constant-Time Algorithms for get 
Continuous Optimization Problems 


Yuichi Yoshida 


Abstract In this chapter, we consider constant-time algorithms for continuous opti- 
mization problems. Specifically, we consider quadratic function minimization and 
tensor decomposition, both of which have numerous applications in machine learn- 
ing and data mining. The key component in our analysis is graph limit theory, which 
was originally developed to study graphs analytically. 


3.1 Introduction 


In this chapter, we turn our attention to constant-time algorithms for continuous 
optimization problems. Specifically, we consider quadratic function minimization 
and tensor decomposition, both of which have numerous applications in machine 
learning and data mining. The key component in our analysis is graph limit theory, 
which was originally developed to study graphs analytically. 

We introduce graph limit theory in Sect. 3.2, and then discuss quadratic function 
minimization and tensor decomposition in Sects. 3.3 and 3.4, respectively. Through- 
out this chapter, we assume the real RAM model, in which we can perform basic 
algebraic operations on real numbers in one step. For a positive integer n, let [n] 
denote the set (1, 2,..., п}. For real values a, b, c € R, а = b + с is used as short- 
hand for b — c < a < b + c. The algorithms and analysis presented in this chapter 
are based on [5, 6]. 


3.2 Graph Limit Theory 


This section reviews the basic concepts of graph limit theory. For further details, 
refer to the book by Lovász [7]. 
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We call a (measurable) function W : [0, 1] — R a dikernel of order К. We 
define 


[МИ = if Е "Wy? dx, (Frobenius norm) 
[0,1] 


Мах = max |W), (Max norm) 
xe[0, 1] 


|5 = sup | Í "Gode. (Cut norm) 
$1... Sk C[0,1] J Six x Sy 


We note that these norms satisfy the triangle inequality. For two dikernels W and 
У", we define their inner product as (М, У) = m. W(x)W'(x)dx. For а 
dikernel W : [0, 1]? — R anda function f : [0, 1] > R, we define a function W f : 
[0, 1] > Ras (W/f)G) = (WG, >), f). 

Let à be a Lebesgue measure. A map л: [0, 1] — [0, 1] is said to be measure- 
preserving if the pre-image л ^!(X) is measurable for every measurable set X, 
and A(x! (X)) = A(X). A measure-preserving bijection is a measure-preserving 
map whose inverse map exists and is also measurable (and, in turn, also measure- 
preserving). For a measure-preserving bijection x : [0, 1] — [0, 1] and a dikernel 
W : [0, 1] >R, we define a dikernel л (W) : [0, 1] > Rasz(W)(xi,.... xg) = 
Wm a), ... л(хк)). 

A partition P = (Vi, ..., Vp) of the interval [0, 1] is called an equipartition if 
A(Vj) = 1/p for every i € [p]. Fora dikernel W : [0, 1] — Rand an equipartition 
Ф = (Vi,..., Vp) of [0, 1], we define Wp : [0, 1]* — R as the dikernel obtained 


by averaging each Vj, x --- x Vi, fori;,...,ik € [p]. More formally, we define 
"Wp(x) : "W (x^ )dx' EI W (x )dx' 
p(x) = -————— x)dx = p x dx, 
I eua AVi) J va xxv, Va xx Vig 


where i, is the unique index such that x, € Vi, for each k € [K]. The following 
lemma states that any dikernel W : [0, 1] — R can be well approximated by Wp 
for some equipartition P into a small number of parts. 


Lemma 3.1 (Weak regularity lemma for dikernels /4]) Let W!,..., WT : [0, 1k > 
R be dikernels. Then, for any € > 0, there exists an equipartition Р into |P| < 
20 e^) parts, such that for every t € [T ], 


IW! — Wolo < «| |р. 


We can construct the dikernel X : [0, 1] — R from a tensor X € Вх" as 
follows. For an integer n є N, let I? = [0, 1], I2 = (4, 2],..., 1" = (44,..., 1]. 
For x € [0, 1], we define i, (x) є [n] as the unique integer such that x € 77. We then 
define X(xi,..., xy) = Xin, (x) ming Quo)" The main motivation of creating a dikernel 


from a tensor is that, in doing so, we can define the distance between two tensors 
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X and Y of different sizes via the cut norm—that is, |X — Y|g, where X and У are 
dikernels corresponding to X and Y , respectively. 

Let W:[0,1]f > R be a dikernel and $, = Gt, Е. for k e [К] be 
sequences of elements in [0, 1]. Then, we define a dikernel У |5, 5, : 10, 1] +R 
as follows: We first extract a tensor W € К° by setting Wj,...i, =W (x). —€— ke . 
Next, we define W| s,..... s, as the dikernel corresponding to W|s, ѕ, - The following 
is the key technical lemma in the analysis of the algorithms given in the subsequent 
sections. 


Lemma 3.2 Let W!,..., WT : [0, 1] — [—L, L] be dikernels. Let S,,..., Sx 
be sequences of s elements uniformly and independently sampled from [0, 1]. Then, 
with probability at least 1 — exp(—Qx (s?(T/ log s)'/K), there exists a measure- 
preserving bijection л : [0, 1] — [0, 1] such that, for every t € [T], we have 


T 1/2K 
IW’ m z (W'|s, m sk) =L. Ок (=) , 
ogs 


where Ox(-) and ©к (-) hide factors depending оп К. 


3.3 Quadratic Function Minimization 


Background 

Quadratic functions are one of the most important function classes in machine learn- 
ing, statistics, and data mining. Many fundamental problems such as linear regression, 
k-means clustering, principal component analysis, support vector machines, and ker- 
nel methods can be formulated as a minimization problem of a quadratic function. 
See, e.g., [8] for more details. 

In some applications, it is sufficient to compute the minimum value of a quadratic 
function rather than its solution. For example, Yamada et al. [13] proposed an efficient 
method for estimating the Pearson divergence, which provides useful information 
about data, such as the density ratio [10]. They formulated the estimation problem 
as the minimization of a squared loss and showed that the Pearson divergence can be 
estimated from the minimum value. Least-squares mutual information [9] is another 
example that can be computed in a similar manner. 

Despite its importance, minimization of quadratic functions suffers from the issue 
of scalability. Let n € N be the number of variables. In general, this kind of min- 
imization problem can be solved by quadratic programming (QP), which requires 
poly(n) time. If the problem is convex and there are no constraints, then the prob- 
lem is reduced to solving a system of linear equations, which requires O (n°?) time. 
Both methods easily become infeasible, even for medium-scale problems of, say, 
n > 10000. 

Although several techniques have been proposed to accelerate quadratic function 
minimization, they require at least linear time in n. This is problematic when handling 
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Algorithm 1 


Input: n є N, query access to a matrix A € IR"*" and to vectors d, b € IR", and є, ô є (0, 1). 
1: S < a sequence of s = s(e, 5) indices independently and uniformly sampled from [л]. 


2: return ^; тіпуєв" Ds A|s d|s bs )- 


large-scale problems, where even linear time is slow or prohibitive. For example, 
stochastic gradient descent (SGD) is an optimization method that is widely used 
for large-scale problems. A nice property of this method is that, if the objective 
function is strongly convex, it outputs a point that is sufficiently close to an optimal 
solution after a constant number of iterations [1]. Nevertheless, each iteration needs 
at least Q (п) time to access the variables. Another popular technique is low-rank 
approximation such as Nystróm's method [12]. The underlying idea is to approximate 
the input matrix by a low-rank matrix, which drastically reduces the time complexity. 
However, we still need to compute the matrix vector product of size n, which requires 
Q (n) time. Clarkson et al. [2] proposed sublinear-time algorithms for special cases of 
quadratic function minimization. However, these are "sublinear" with respect to the 
number of pairwise interactions of the variables, which is © (n°), and the algorithms 
require O(n log“ n) time for some c > 1. 


Constant-time algorithm for quadratic function minimization 
Let A € R"*" bea matrix and d, b € IR" be vectors. Then, we consider the following 
quadratic problem: 


minimize Pn.A d b (V), where р, A a 5 (v) = (v, Av) + n(v, diag(d)v) + n(b, v), 
зе" 


(3.1) 


where (-, -) denotes the inner product and йар (4) denotes a diagonal matrix in which 
the diagonal entries are specified by d. Note that although a constant term can be 
included in (3.1), it is omitted here because it is irrelevant when optimizing (3.1), 
and hence we omit it. 

Let z* є R be the optimal value of (3.1) and let e, 6 є (0, 1) be parameters. Then, 
our goal is then to compute z with |z — z*| = O (en?) with probability at least 1 — ô 
in constant time. We further assume that we have query access to A, b, and d, with 
which we can obtain their entry by specifying an index. We note that z* is typically 
© (n?) because (v, Av) consists of © (n?) terms, and (v, diag(d)v) and (b, v) consist 
of O (n) terms. Hence, we can regard the error of O (en?) as an error of O (e) for 
each term, which is reasonably small in typical situations. 

Let -|$ be an operator that extracts a submatrix (or subvector) specified by an 
index set S C N. Our algorithm is then given by Algorithm 1, where the parameter 
s :— s(€,6) is determined later. In other words, we sample a constant number of 
indices from the set [п], and then solve the problem (3.1) restricted to these indices. 
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Note that the number of queries and the time complexity are O (s?) and poly(s), 
respectively. 

The goal of the rest of this section is to show the following approximation guar- 
antee of Algorithm 1. 


Theorem 3.1 Let v* and z* be an optimal solution and the optimal value, respec- 
tively, of problem (3.1). By choosing s(e, 5) = 200/6) 4 O(log 1 log log 1), with 
probability at least | — ô, a sequence S of s indices independently and uniformly 
sampled from [n] satisfies the following: Let V* and Z* be an optimal solution and the 
optimal value, respectively, of the problem minyer: ps, Aj, a|s, pi, (V). Then, we have 


«eLM?n?, 


uu = Ze 
where 


L = max [max [Aij], max Idil, max in] and M — max [mes [у], mar Iv? |. 
We can show that М is bounded when A is symmetric and full rank. To see this, we 
first note that we can assume A + ndiag(d) is positive-definite, as otherwise p; a.p 
is not bounded and the problem is uninteresting. Then, for any set 5 С [n] of s indices, 
(A 4- ndiag(d))|s is again positive-definite because itis a principal submatrix. Hence, 
we have v* = (A + ndiag(d)) ! nb/2andv* = (Als + ndiag(d|s))- ! nb|s/2, which 
means that M is bounded. 


3.3.1 Proof of Theorem 3.1 


To use dikernels in our analysis, we first introduce a continuous version of p, 4 4.5. 
The real-valued function Р, 4 4.5 on the functions f : [0, 1] — R is defined as 


Py Aa Uf) = (f, Af) + (F7, D1) + (у, 81), 


where D and 8 are the dikernels corresponding to d1' and b1', respectively, 
f: [0, 1] — R is a function such that f? (x) = f(x)? for every x € [0,1] and 

: [0, 1] — R is a constant function that has a value of 1 everywhere. The fol- 
SA lemma states that the minimizations of p, 4 4,5 and P, 44,5 are equivalent: 


Lemma 3.3 Let A € К"х" be a matrix and d, b € R"*" be vectors. Then, we have 


min у) = п? inf Я 
jf Pn,A,d,b(V) йа ran Ps A a b Cf) 


for any M > 0. 
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Proof First, we show that п? - inf rto, 115 [2 M, M] Pn; Aa p (f£) < minyep yy 
Pn.A.d.b(v). Given a vector v € [-M, M]”, we define f : [0, 1] — [—M, M] as 
f) = Vi, (x)- Then, 


1 
ояу = Y [А I, Au f CO fO)dedy = — Y Aviv; = 0. АУ), 


i,je[n] i, je[n] 


(72,01) ELE E 


i, ии ie[n] 


ee 2» di v? = —(v, diag(d)v), 


П я] 


(f, Bl) = SU MINES E Y bin = H (у, b). 


i,je[n] ie[n] ie[n] 


Hence, we have n? Р, 4 q.s (f) < Pn.A.d.b(?)- 
Next, we show that тіп, м,му Pa, A,d,b (V) < nm. inf y40,1] 5 [7 M, M] Pn, A a b Cf). 
Let f : [0, 1] — [—M, M] be a measurable function. For x € [0, 1], we then have 


0 P, A a b Cf (x)) 
9f (x) 


= - X. Aii, (x) f Q)dy + у, | Ai, coj f Ody T 2di, (x) f (x) + bi, (x). 


ie[n] Јє[п] 


Note that the form of this partial derivative depends on only i, (x). Hence, in the 
optimal solution f* : [0, 1] — [— М, М], we can assume f*(x) = f*(y) ifi (x) = 
i; (y). In other words, f* is constant on each of the intervals 77, ..., /7. For such 
f*, we define the vector v € IR" as v; = f*(x), where x € [0, 1] is any element in 
1". Then, we have 


(v, Av) => Aijvivj = п? E ГА I, Aij f* (x) f* (y)dxdy = n? (GF. fT) 


i,je[n] i,je[n] 


(v, diag(d)v) = 3 div? =") f, d; f* G!dx = n((f*)’, 1), 


ie[n] ie[n] 
= У`ьи=п У’ [А b; f*(x)dx =n(f*, 81). 
ie[n] іє[п] 


Hence, we have pn Aab) < n? P, A a p Cf). 


Proof (of Theorem 3.1) We instantiate Lemma 3.2 with = 290/6) + Ө (ор 1 log log 1) 
and the dikernels A, D, and B. Then, with probability at least 1 — 5, there exists a 
measure-preserving bijection л : [0, 1] — [0, 1] such that 


«LM? 
3 


max [If Gt — zs). 1/7, @ — т(®|)у, 07 (8 — #681500] < 
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for any function f : [0, 1] — [—M, M]. Conditioned on this event, we have 


2" = min Ps, 4i; ais ais 7) = "m Ps,Als.d\s,bls 9) 
— 
EP ‚їй Ps A|s dis.bls Cf) (By Lemma 3) 
— 2. А Е j 
= it s OAL) — ADA) + AF) + (/°, cD) D)1)+ 
(f?, Dl) + (f, Gr (81) — 8)1) + (f. 81)) 

2. 1 2 L 2 
ss ink a (FU EU? DI) + (у, Bl) E eL?) 

2 
= a - min ридав) E eLM? 5”. (By Lemma 3) 

n ve[—M,M]" ; 

s? 2 


= : 22_5 жу 2.2 
= — -minp,aaevy) + eLM^s^ = —z* + єЇ,М*^5*. 
2 уе" п? 


Rearranging the inequality, we obtain the desired result. 


3.4 Tensor Decomposition 


Background 

We say that a tensor (or a multidimensional array) is of order K if it is a K- 
dimensional array. Each dimension is called a mode in tensor terminology. Tensor 
decomposition, which approximates the input tensor by a number of smaller tensors, 
is a fundamental tool for dealing with large tensors because it drastically reduces 
memory usage. 

Among the many existing tensor decomposition methods, Tucker decomposi- 
tion [11] is a popular choice. To some extent, Tucker decomposition is analogous to 
singular-value decomposition (SVD). Whereas SVD decomposes a matrix into left 
and right singular vectors that interact via singular values, Tucker decomposition of 
an order-K tensor consists of K factor matrices that interact via the so-called core 
tensor. The key difference between SVD and Tucker decomposition is that, in the 
latter, the core tensor does not need to be diagonal and its “rank” can differ for each 
mode. We refer to the size of the core tensor, which is a K-tuple, as the Tucker rank 
of a Tucker decomposition. 

We are usually interested in obtaining factor matrices and a core tensor to minimize 
the residual error—the error between the input and low-rank approximated tensors. 
Sometimes, however, knowing the residual error itself is a task of interest. The 
residual error tells us how suitable a low-rank approximation is to approximate the 
input tensor in the first place, and is also useful to predetermine the Tucker rank. 
In real applications, Tucker ranks are not explicitly given, and we must select them 
by considering the tradeoff between space usage and approximation accuracy. For 
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example, if the selected Tucker rank is too small, we risk losing essential information 
in the input tensor, whereas if the selected Tucker rank is too large, the computational 
cost of computing the Tucker decomposition (even if we allow for approximation 
methods) increases considerably along with space usage. As with the case of the 
matrix rank, one might think that a reasonably good Tucker rank can be found using 
a grid search. Unfortunately, grid search for an appropriate Tucker rank is challenging 
because, for an order-K tensor, the Tucker rank consists of K free parameters and 
the search space grows exponentially in K. Hence, we want to evaluate each grid 
point as quickly as possible. 

Although several practical algorithms have been proposed, such as the higher order 
orthogonal iteration (НООР) [3], they are not sufficiently scalable. For each mode, 
HOOI iteratively applies SVD to an unfolded tensor—a matrix that is reshaped from 
the input tensor. Given an N, x --- x Ny tensor, the computational cost is hence 
O(K max, Мк. П, Nx), which crucially depends on the input size Nj, ..., Nx. 
Although there are several approximation algorithms, their computational costs are 
still intensive. 


Constant-time algorithm for the Tucker fitting problem 

The problem of computing the residual error is formalized as the following Tucker 
fitting problem: Given an order-K tensor X є КМХ and integers Ry < Ng (k = 
1,..., K), we want to compute the following normalized residual error: 


|х -14;0%,..., и 
F 


Un, Rg (X) := min , Q2) 
Ly RK GERRI” XRK (UO ERNE*RE) orgy Пак № 
where [G; U,..., Се Вх is an order-K tensor, defined as 
1 K k 
[GO UO Ve SY Guage TU 
ri e[ R1] 35683 rk e[Rk] ke[K] 
for every i; € [Ni], ..., ig € [Nx]. Here, G is the core tensor, апа U, ... , U(O 


are the factor matrices. Note that we are not concerned with computing the minimizer, 
but only want to compute the minimum value. In addition, we do not need the 
exact minimum. Indeed, a rough estimate still helps to narrow down promising rank 
candidates. The question here is how quickly we can compute the normalized residual 
„Rg (X) with moderate accuracy. 

In this section, we consider the following simple sampling algorithm, and 
show that it can be used to approximately solve the Tucker fitting problem. First, 
given an order-K tensor X є Вх "х\к, Tucker rank (Rj,..., Rg), and sam- 
ple size s € N, we sample a sequence of indices S, = (хї, ...,X*) uniformly 
and independently from [N;] for each mode k є [K]. We then construct a mini- 
tensor X |5,,.„ѕк € RX, where (Х| к), = Ху Ле vk Finally, we com- 
pute Ёк, к CX|s,,..,s,) using an arbitrary solver, such as HOOI, and output the 
obtained value. The details are provided in Algorithm 2. Note that the time complex- 
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Algorithm 2 Sampling algorithm for the Tucker fitting problem 


Input: N;,..., Nx € N, query access to a tensor X є RMNiX--XNk Tucker rank (Ri, ..., Re), and 
€, ó € (0, 1). 

1: fork = 1 to K do 

2: S, «— a sequence of s = s(e, ô) indices uniformly and independently sampled from [Мк]. 


3: Construct a mini-tensor X|s,..... 5, . 
4: return £g, gy (X|s;,. sy). 


ity for computing fp, кк (X |s,...., s, ) does not depend on the input size №, ..., Nx 
but rather on the sample size s, meaning that the algorithm runs in constant time, 
regardless of the input size. 

The goal of the rest of this section is to show the following approximation guar- 
antee of Algorithm 2. 


Theorem 3.2 Let X є ххк be a tensor, Ri,..., Rx be integers, and e€, 8 € 
(0, 1). For s(e,8) = 2°") + Ө (ов Hoglog 1), we have the following. Let 
Si, ..., Sk be sequences of indices as defined in Algorithm 2. Let (G*, Ur, ..., UZ) 
and (G*, Ur — 0%) be minimizers of problem (3.2) on X and X |5, 
the factor matrices are orthonormal, respectively. Then we have 


s, for which 


LR, Rg (XI sius) = Cn n (X) + О(є12(1 +2MR)), 


with probability at least 1 — 8, where L = |X |max, M = max{|G*|max, [С*[ шах}, апа 
К= Iker Ry. 


We remark that, for the matrix case (i.e., K = 2), |G*|max and \С* |max are equal to 
the maximum singular values of the original and sampled matrices, respectively. 


3.4.1 Preliminaries 


Let X € хк be a tensor. We define 


|Х|к = (Frobenius norm) 
|X|max — , max Ке (Max norm) 
ivelNi. ig eL Ng] 
X|n = max Xj ig] s Cut norm 
Ix! SiC[Ni]..... Sk CINE] 2- NONE ( ) 


ies ya ik€Sk 


We note that these norms satisfy the triangle inequality. 
For a vector v € R” and a sequence 5 = (x1, ..., Xs) of indices in [n], we define 
the restriction v|s € R5 of v as (v|s); = Vx, for i є [s]. Let X € Мх be a 
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tensor, and 5; = (XE. m x*) be a sequence of indices in [№] for each mode k є 
[K]. Then, we define the restriction X|s,...s, € К of X to S; x +--+ x Sx as 
(Х|, О Ја = Ха а х ‘3 for each i, Є [М\],....їк Є [Мк]. 


For a tensor СЄ“! < and vector-valued functions {F® : [0, 1]—>®*} ерк, 
we define an order-K dikernel [G; F™,..., Р: [0, 1]F — Ras 


1 K k 
[GP VQ I Laude. ye |, 
reli]... rg ety] ke[K] 
We note that [G; Е), ..., Е] is a continuous analogue of Tucker decomposition. 


3.4.2 Proof of Theorem 3.2 


To prove Theorem 3.2, we first consider the dikernel counterpart to the Tucker fitting 
problem, in which we want to minimize the following: 


2 
Eg, RU = inf X —[G; f, ..., f 1 p (3.3) 


GERRI” xRK { f©:[0,1] >R" Jerk] 


The following lemma, which is proved in Sect. 3.4.3, states that the Tucker fitting 
problem and its dikernel counterpart have the same optimum values. 


Lemma 3.4 Let X € Мх" хк be a tensor, and let Ri,..., Rg € N be integers. 
Then, we have 


„Ек (X) = LR... Rg (X). 


For a set of vector-valued functions F = {f® : [0,1] > RA repe, we define 
|F |max = шахкегк\,є[ву],хє[о, 0 (x). For a dikernel X : [0, 1]* — R, we define a 
dikernel X? : [0, 1] > Ras X?(x) = X(x}? for every x € [0, 1]*. The following 
lemma, which is proved in Sect. 3.4.4, states that if X and Y are close in the cut 
norm, then the optimum values when the Tucker fitting problem is applied to them 
are also close. 


Lemma 3.5 Let Х,У: [0, П > R be dikernels with |X — Y|n < є and |X? — 
|р < e. For Ri, ..., Ёк € М, we have 


ny... te (A = ny. (Y) E 2€(1 + (бх, + Gy Imal Fols) ) 


where (Gx, Fx — UP ere) and (Gy, Fy = (fy ека) are solutions to prob- 
lem (3.3) on Х and Y, respectively, which have objective values exceeding the infima 
by at most €, and R = Пак Ry. 
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Proof (of Theorem 3.2) We apply Lemma 3.2 to X and X?. Thus, with probability at 
least 1 — ô, there exists a measure-preserving bijection л : [0, 1] — [0, 1] such that 


IX — z(Xls,..s,)lo € eL. and |X? — z (Xs, slo x «12. 


— 


In the following, we assume that this has happened. By Lemma 3.5 and the fact that 
E кк (Х15\,...,5к) = n, Re (Т (Х|,,....5„)), we have 


Un... Ri (XlS1, 5r) = бк.) E EL? (1 + 2R Ima Fl +16111), 


where (G, F = {f}xerx) and (G, F ={f}xerx;) are as in the statement of 
Lemma 3.5. From the proof of Lemma 3.4, we can assume that |G |max = |G*| max; 
IG lia = IG* | max» |F|max € 1, and |F [max € 1 (owing to the orthonormality of 
Ur, ..., Ug and Ut, .. ., Uz). It follows that 


Сы (Xs) = Un (X) E ELA (1 2RQG* max +161). 6-4 


Un, Re (X1s;,...,5¢) = LR, ny (А151,..„5к) (By Lemma 4) 
= lR, — ‚кк (X) x eL *(1 + 2806" + Г) (Ву 4) 
= бы (X) + EL? (1 + 2R(|G* max +161). (By Lemma 4) 


Hence, the proof is complete. 


3.4.3 Proof of Lemma 3.4 


We say that a vector-valued function f : [0, 1] — IR^ is orthonormal if | f,, f.) = 1 
foreveryr є [R] and (f,, f») = Oifr 5 r'. First, we calculate the partial derivatives 
of the objective function. We omit the proof because it is a straightforward (but 
tedious) calculation. 


Lemma 3.6 Let X є [0, 1]* — R be a dikernel, G € Вх" be a tensor, and 
[f * [0, 1] > В“ tke] be a set of orthonormal vector-valued functions. Then, 
we have 
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д | 2 

ыс Х—16;/°,..../ © 

afi” (хо) F 

=2 Уу Gaor f | Хо) [| Sead 
Tp sl K Tko TO [0,1] ^ xig =x0 ke[K]\{ko} 


k 
-2 > Gr Gran caring I (xo). 


Грк 


Proof (of Lemma 3.4) First, we show that (LHS) < (RHS). Consider a sequence of 
solutions for the continuous problem (3.3) for which the objective values attain the 
infimum. For Tucker decompositions, it is well known that there exists a minimizer 
for which the factor matrices U®, ..., UO are orthonormal. By similar reasoning, 
we can show that the vector-valued functions f,..., f? in each solution of the 
sequence are orthonormal. As the objective function is coercive with respect to tensor 
G, we can take a subsequence for which G converges. Let G* be the limit. Now, 
for any 6 > 0, we can create a matrix G by perturbing G* so that (1) by fixing G 
to G in the continuous problem, the infimum increases only by 8, and (ii) a matrix 
constructed from С is invertible and has a condition number at least 5’ = 8'(8) > 0. 

Now, consider a sequence of solutions for the continuous problem (3.3) with G 
fixed to G for which the objective values attain the infimum. We can show that the 
partial derivatives converge to zero almost everywhere. For any e > 0, there then 
exists a solution (С, fU, ..., f?) in the sequence such that the partial derivatives 
are at most e almost everywhere. 

Then by Lemma 3.6, for any ko € [K], ro € [Ёк], and almost all x € [0, 1], we 
have 


) 2 ^ (ko) 
Gar Gry ergy story тк To (хо) 


= y Um) Хо) [| SE adx elko, ro, х), (3.5) 
[0,1]“ 3x, xo 


ke[K]\{ko} 


where € (ko, ro, x) = O (e). Now, we consider a system of linear equations consisting 
of (3.5) for ro = 1,..., Ri, where the variables are fE (хо),..., n (xo). We can 
assume that the matrix involved in this system is invertible and has a positive condition 
number. For any k € [K],r є [R] and almost every pair x, x’ € [0, 1] with iy, (x) = 
iy, (x^), we then have f% (x) = f (x^) + O(e/3'). For each k € [K], we can 
define a matrix С“ є RXR: as ue = f(x), where x € [0, 1] is an arbitrary 
value with iy, (x) = i. Then, we have 
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1 5. pO doy _ 1 5. pO (к) : 
yx - 16: u,... VOY = 7 (ха - 16: U,..., VI, i) 
i 


,, (X@) - 16; £9, ..., 700) + O cJ) dx 
1 [4 


X — [6; f,..., re + O(NJ( y) 


for № = [|], ЄК] Мк. As the choice of e and ô are arbitrary, we obtain (LHS) < (RHS). 

Second, we show that (RHS) < (LHS). Let U e ВМХ (k e [K $ be matrices. 
We define a vector-valued function f : [0, 1] > R^ as f(x) = 
k e [K] and r € [R]. Then, we have 


Ui. (air for each 


|x- ГС: Pear = [ MUZE IG; f», s POIO) dx 


m Tw (teo - 1a: у®,..., Me) ax 


ke[K] 


2 
323 м -INT аар) 


= |х - 16: 9... vy 
N d 3 , F 


from which the claim follows. 


3.4.4 Proof of Lemma 3.5 


For a sequence of functions f“,..., f, we define their tensor product „ку 
f? є [0, 1]* — Ras акі (х1, ...,хк) = ms f? (xj), which is a dik- 
ernel of order- K 


The cut norm is useful for bounding the absolute value of the inner product 
between a tensor and a tensor product: 


Lemma 3.7 Let € > 0 and W : [0, 1] — R be a dikernel with |W|g < e. Then, 
for any functions f,..., f? : [0,1] > [-L, L], we have |(W, Ceu [= 
єк. 


Proof Fort є Rand the function A : [0, 1] — R, let L, (h) := {х є [0, 1] | h(x) = 
т} be the level set of A at т. For f = f/L, we have 
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(w. Q m) E (w. Q | 
ke[K] ke[K] 
= [к | П = f W(x)dxdt 
[-1,1]* ke[K] Пак Ly f^?) 
< L* Пі |с] In "W(x)dxdv 
[-1.u* ke[K] Tre Ly C9) 
<er" f П Im |dt = eLF. 
ILN“ керк] 
Thus, we have the following: 
Lemma 3.8 Let X, V : [0, 1] — R be dikernels with |X — Y\p < є and |X? — 
M?|n < e, where X? (x) = X(x)? and У?(х) = Y (x)? for every x € [0, 1]F. Then, 
for any tensor G € ВК "к and a set of vector-valued functions F = (f? : 


[0, 1] > ерк, we have 


П 


where R = [I К 
Proof We have 


eer 


|x- IG; f? 


f (x - 16; 7 
[0,11% 


emi 


fon» _ 


- (ув - 16 1 
(0, 11" 


y = 0С: £,..., у | + «(1 + 2RIG Imal FI). 
|y — IG; fo "m 12) 
"- foy) ax 


mm 


sa) ax 


= f (xæ — (ey?) ax -2 / (Хх) — V GIG: f, ..., F Med 
[0,1]* [0,1]* 
< | - Wg +2 [е rk: (к-у Q & 
ri e[R1],.... ree CRI] ke[K] 

Ser 2e RIGImax| Ё|Ках 
by Lemma 3.7. 
Proof (of Lemma 3.5) By Lemma 3.8, we have 

2 
|У - rey; A... sn. s | - rens £P... Pe 


< |x- [Ge fP, cad 


6 TES + (2e + 2€R|Gx ках ЕХ). 
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Similarly, we have 


2 2 
|X- 16: A... AOU] < Jc го: A... LP) e 


2 
= |у t6: £9)... Л), + (2€ + ZERI Gulim Fy ka): 


Hence, the claim follows. 


References 


1. 
2. 


3. 


L. Bottou, Stochastic learning, in Advanced Lectures on Machine Learning (2004), pp. 146-168 
K.L. Clarkson, E. Hazan, О.Р. Woodruff, Sublinear optimization for machine learning. J. ACM 
59(5), 23:1-23:49 (2012) 

Lieven De Lathauwer, Bart De Moor, Joos Vandewalle, On the best гапк-1 and rank- 
(r1, 72, ..., ry) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 
1324-1342 (2000) 


. A. Frieze, R. Kannan, The regularity lemma and approximation schemes for dense problems, 


in FOCS (1996), pp. 12-20 


. K. Hayashi, Y. Yoshida, Minimizing quadratic functions in constant time, in NIPS (2016), pp. 


2217-2225 


. К. Hayashi, Y. Yoshida, Fitting low-rank tensors in constant time, in NIPS (2017), pp. 2473- 


2481 


. L. Lovász, Large Networks and Graph Limits (American Mathematical Society, 2012) 
. K.P. Murphy, Machine Learning: A Probabilistic Perspective (The MIT Press, 2012) 
. Taiji Suzuki, Masashi Sugiyama, Least-squares independent component analysis. Neural Com- 


put. 23(1), 284-301 (2011) 


. M. Sugiyama, T. Suzuki, T. Kanamori, Density Ratio Estimation in Machine Learning (Cam- 


bridge University Press, 2012) 


. Ledyard R. Tucker, Some mathematical notes on three-mode factor analysis. Psychometrika 


31(3), 279-311 (1966) 


. K.I. Christopher, in Using the Nyström Method to Speed up Kernel Machines, NIPS eds. by C. 


Williams, M. Seeger (2001) 


. M. Yamada, T. Suzuki, T. Kanamori, H. Hachiya, M. Sugiyama, Relative density-ratio estima- 


tion for robust distribution comparison, in NIPS (2011) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter's Creative 


Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 4 A) 
Oracle-Based Primal-Dual Algorithms crest 
for Packing and Covering Semidefinite 
Programs 


Khaled Elbassioni and Kazuhisa Makino 


Abstract Packing and covering semidefinite programs (SDPs) appear in natural 
relaxations of many combinatorial optimization problems as well as a number of 
other applications. Recently, several techniques have been proposed that utilize the 
particular structure of this class of problems in order to obtain more efficient algo- 
rithms than those offered by general SDP solvers. For certain applications, it may 
be necessary to deal with SDPs with a very large number of (e.g., exponentially or 
even infinitely many) constraints. In this chapter, we give an overview of some of the 
techniques that can be used to solve this class of problems, focusing on multiplicative 
weight updates and logarithmic-potential methods. 
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We denote by S" the set of all n x n real symmetric matrices and by 5% С S” the set 
of all n x n positive semidefinite (psd) matrices. We consider the following pairs of 
packing-covering semidefinite programs (SDPs): 


21 = max Сө X (PACKING- I) 27 = min bT y (COVERING- I) 
s.t. Aj X < bj, Vi € [m] m 
X eS", X-0 ЕЕ 
ye R”, у> 0, 
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zjj = min C eX (COVERING- Ш) zr — max bly (PACKING- II) 
s.t. Aj e X > bj, Vi € [m] m 
Xes”, X-0 st.) yA С 
i=l 
y eR”, у> 0, 
where C, Aj,..., Am € $' are (non-zero) psd matrices, and b = (bi, ..., ba)! € 


R7 is a non-negative vector. In the above, C e X := Tr(CX) = 57; У сух, 
and “>” is the Löwner order on matrices: A > B if and only if A — B is psd. This 
type of SDP arises in many applications. See, for example, [14, 15] and the references 
therein. 

We assume the following throughout this chapter: 


(A) bj > О and hence b; = 1 for alli € [m]. 


Itis known that, under assumption (A), strong duality holds for problems (PACKING- 
I) and (COVERING- I) (resp., (PACKING- II) and (COVERING- П)). Let € є (0, 1] bea 
given constant. We say that (X, y) is an €-optimal primal-dual solution for (PACKING- 
D-(COVERING- I) if (X, y) is a primal-dual feasible pair such that 


Ce X >(1—є)ЬТу > (1 — є)2*. (4.1) 


Similarly, we say that (X, y) is an €-optimal primal-dual solution for (PACKING- П)- 
(COVERING- II) if (X, y) is a primal-dual feasible pair such that 


CeX < (1+ )9)b* y < (1+ ez. (4.2) 


In this chapter, we allow the number of constraints m in (PACKING- I) (resp., 
(COVERING- П)) to be exponentially (or even infinitely) large, so we assume the 
availability of the following oracle: 


Max(Y)(resp., Min(Y)) : Given Y € S}, find i € argmax 
argmin Aj e Y). 


Aj Y (resp, i € 


ie[m] 


ie[m] 


Note that an approximation oracle computing the above maximum (resp., mini- 
mum) within a factor of (1 — €) (resp., (1 + €)) is also sufficient for our purposes. 
A primal-dual solution (X, y) to (COVERING- I) (resp., (PACKING- П)) is said to be 
n-sparse if the size of supp(y) := (i € [т]: y; > 0] is at most y. 

When C = I = I, (which is the identity matrix in IR"*") and b = 1,, (which is 
the vector containing all ones in IR"), we say that the packing-covering SDPs are 
in normalized form. It can be shown (see, e.g., [7, 16]) that, to within a multiplica- 
tive factor of (1 + €) in the objective, any pair of packing-covering SDPs of the 
form (PACKING- I)-(COVERING- I) can be brought to normalized form in O (n?) time 
while increasing the oracle time by only O (n?), where w is the exponent of matrix 
multiplication, under the following assumption: 
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(B-I) There exist r matrices, say A1, ..., A, such that А:= Уу А: > 0. In par- 


ticular, Tr(X) < т := жы for any optimal solution X for (PACKING- I), and 


we may assume that r = 1 and A; = 11. 


Similarly, it can be shown that, to within a multiplicative factor of (1 + є) in ће 
objective, any pair of packing-covering SDPs of the form (PACKING- IT)-(COVERING- 
II) can be brought to normalized form in O(n?) time, while increasing the oracle 
time by only O (n?). Moreover, we may assume in this normalized form that 


(B-ID. Agi (A;) = Q(£ - min Ama (Aj) for all i € [m], 


where, for a psd matrix B € S^, we denote by (4;(B) : j = 1,..., n} the eigenval- 
ues of B, and by Amin(B) and Amax(B) the minimum and maximum eigenvalues of 
B, respectively. Given additional O (mn?) time, we may also assume that 


(B-IP) +a) = 0(") for all є [m]. 


Алах (Аг) 
Amin(Ai) — 


Thus, the remainder of this chapter focuses on normalized problems. 


Mixed packing and covering SDPs. 
We also consider the following mixed packing-covering feasibility SDPs: 


AjeX x bi, Vi € [mp] (МІХ- PACK- COVER) 
B; èo X >а, Vi € [m,] 
X eS”, X -0, 


where A;, ..., Am,» B,,..., Bm, € R"*” are psd matrices, and b = (bi, ..., bm)”, 
а= (di,..., T are non-negative real vectors. 

A matrix X € 5" is an €-approximate solution for (МІХ- PACK- COVER) if А; e 
X x bj forall i € [my] and B; e X > (1 — €)d; for alli € [me]. 


4.2 Applications 


4.2.1 SDP relaxation for Robust MAXCUT 


Given a simple undirected graph G = (V, E) оп п = |V| vertices with non-negative 
edge weights w € RË, the objective in the well-known MAXCUT problem is to find a 
subset of the vertices X C V that maximizes the weight of the cut: w(X, V V X) := 
a X, vev\x Wuv- The best-known approximation algorithm (with approximation 
ratio 0.878...) [10] for MAXCUT is based on the following SDP relaxation: 
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max L(w) eX (MAXCUT- SDP) 


st 117 eX =1, Vi e [n] (4.3) 
X € R”, X > 0. 


By simply changing the equality in (4.3) into an inequality, this сап be written in 
the form (PACKING- D), with A; := 1,1) and C := L(w) > 0 being the Laplacian 
matrix of G, defined as follows: 


ea Е у, 
Lyw) =$ -wj if fi, j} €E, 
0 otherwise. 


Based on this relaxation, the following result is obtained using the scalar multiplica- 
tive weights update (MWU) method: 


Theorem 4.1 ([18]) There is a randomized algorithm for finding an €-optimal solu- 
tion for (MAXCUT- SDP) in time O(" 2), where п and m respectively denote the 


РЕ 
number of vertices and edges їп а given graph. 


Under the robust optimization framework, one assumes the weights are not known 
precisely, but instead are given by a convex uncertainty set W С К", where it is 
necessary to find a (near)-optimal solution under the worst-case choice w € W in 
the uncertainty set: 


max пип, L(w)eX ROBUST- MAXCUT- SDP 
we 


st LleX-l, Vie [n] (4.4) 
X € Rx" X - 0. 


By "guessing" the value t of an optimal solution (via binary search), (4.4) can be 
reduced to 


min / eX 


ROBUST- MAXCUT- SDPs.t. 1;17 eX >1, Vi e [n] 
1 
—L(w)eX > 1, VweW 
T 


Хе", X>0. 
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Thus, we obtain a covering SDP (of type (COVERING- П)) with an infinite number 
of constraints, given by a minimization oracle over the convex set V. We can use 
the matrix logarithmic-potential method to obtain the following result: 


Theorem 4.2 There is a randomized algorithm that finds an €-optimal solution for 

Doo Af got | | A ; . 
(4.4) in time O (5 + "T, where T is the time needed to optimize a linear function 
over W. 


Note that for this reduction to remain valid, it is sufficient to find an e-optimal solution 
to (4.4) for any e — o(1). 


4.2.2 Mahalanobis Distance Learning 


Given a psd matrix X € S", the X-Mahalanobis distance between two points a, b € 
R” is defined as 


dx (a, b) := y (a — b)? X (a — b). 


The distance function dy (-, -) is a semi-metric; that is, it is symmetric (dx (a, b) = 
dx (a, b)) and satisfies the triangle inequality (dx (a, c) < dx(a, b) + dy(b, с)), and 
itis also a metric if X > 0 (as in this case, dx (a, b) = O if and only if a = b). 

The Mahalanobis distance learning problem is defined as follows [28]: Given sets 
С; and C4 of similar and dissimilar pairs of points in IR", respectively, a similarity 
parameter о; € R4 and a dissimilarity parameter og € R4, the objective is to find a 
matrix X such that all the pairs in C; are “close” and all the pairs in C; are "far" with 
respect to the distance function dy (-, +): 


(a — b)" X(a — b) € os, V(a, b) € C, (4.5) 
(a — b)! X (a — b) > оа, V(a, b) € Ca (4.6) 
X eS, X - 0. (4.7) 


Note that this can be written in the form (MIX- PACK- COVER), with |C;| packing 
constraints of the form A, „ө X < oç, where Ag, = (a — b)(a — b)! for (a,b) € 
Cs, and |C;| covering constraints of the form B, e X > o4, where Ba p = (a — 
b)(a — b)" for (a, b) € Са. 

We can use the scalar MWU method to obtain the following result: 
Theorem 4.3 There is a deterministic algorithm that finds an €-feasible solution 
for (4.5)-(4.2.2) in time oe), where n is the dimension of the point sets and 
т := C + IC. 
We remark that it is plausible that further improvements (possibly by another factor 
of O(m)) are possible via rank-one tricks and the use of approximate eigenvalue 
computations. 
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4.2.3 Related Work 


Problems (PACKING- I)-(COVERING- I) and (PACKING- II)-(COVERING- II) can be 
solved using general SDP solvers, such as interior-point methods. For example, 
the barrier method (see, e.g., [22]) can compute a solution within an additive error 
of є from the optimal in time O (nm (1? + mn? + т?) log 1) (see also [1, 27]). 
However, due to the special nature of (PACKING- I)-(COVERING- I) and (PACKING- 
ID-(CovzERING- П), better algorithms can be obtained. Most of the improvements are 
obtained by using first-order methods [2, 3, 5, 6, 8, 15—18, 21, 23, 24], or second- 
order methods [13, 14]. In general, we can classify these algorithms according to 
whether they are (semi) width-independent, are parallel, output sparse solutions, or 
are oracle-based, as follows. 


(I) (Semi) width-independent: The running time of the algorithm depends polyno- 
mially on the bit length of the input. For example, in the of case of (PACKING- 
I)-(COVERING- I), the running time is poly(n, m, L, log v, 1), where £ is the 
maximum bit length needed to represent any number in the input. In contrast, 
the running time of a width-dependent algorithm depends polynomially on a 
“width parameter" p, which is polynomial in £ and r. 

(ID Parallel: The algorithm takes polylog(n, m, £, log 1) - poly(+) time on a poly 
(n, m, L, log t, 1) number of processors. 
(Ш) Sparse: The algorithm outputs an 7-sparse solution to (COVERING- I) (resp., 
(PACKING- П)) for n = poly(n, logm, £, log v, 1) (resp., n = poly(n, log m, 
L; 1)), where т is a parameter that bounds the trace of any optimal solution Х; 
(IV) Oracle-based: The only access the algorithm has to the matrices Aj, ..., Am 
is via the maximization/minimization oracle, and hence the running time is 
independent of m. 


Table 4.1 below gives a summary! of the most relevant results together with their 
classifications according to the four criteria above. We note that almost all of these 
algorithms for packing/covering SDPs are generalizations of similar algorithms for 
packing/covering linear programs (LPs), and most of them are essentially based on 
an exponential potential function in the form of scalar exponentials, such as [3, 18], 
or matrix exponential [2, 5, 6, 15, 17]. For instance, several of these results use the 
scalar or matrix versions of the MWU method (see, e.g., [4]), which are extensions 
of similar methods for packing/covering LPs [9, 11, 25, 29]. 

In [12], a different type of algorithm was given for covering LPs (indeed, more 
generally, for a class of concave covering inequalities) based on a logarithmic poten- 
tial function. In [7], it was shown that this approach could be extended to provide 
sparse solutions for both versions of packing and covering SDPs. 

As we can see from the table, among all the algorithms, only the matrix (MWU 
and logarithmic-potential) algorithms are oracle-based (and hence produce sparse 


! We provide rough estimates of the bounds, as some of them are not stated explicitly in the corre- 
sponding paper in terms of the parameters we consider here. 
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solutions) in the sense described above. However, the overall running time of the 
matrix MWU algorithm is larger by a factor of (roughly) Q (n?-^) than that of the 
logarithmic-potential algorithm, where w is the exponent of matrix multiplication. 
Moreover, we cannot extend the matrix MWU algorithm to solve (PACKING- I)- 
(COVERING- I) (in particular, it seems tricky to bound the number of iterations). 


4.3 General Framework for Packing-Covering SDPs 


Given a pair of packing-covering SDPs (PACKING- I)-(COVERING- I) or (COVERING- 
ID-(PACKING- II), we consider the following general framework in which each con- 
straint is assigned a weight reflecting how satisfied the constraint is given the current 
solution: 


1 Initialize constraint weights 
2 while the stopping criterion is not satisfied do 
3 | Forma "weighted average" of all the inequalities into a single inequality 
4 | /* If we maintain weights for primal —> scalar version */ 
5 | /* [f we maintain weights for dual —> matrix version */ 
6 | Solve a fractional knapsack problem to determine the direction of the next 
update 
7 | Update the primal (or sometimes dual) variables in the chosen direction 
8 | Update the weights to reflect which constraints become more satisfied 
9 | /* Weights (essentially) <—> dual (or sometimes primal) variables */ 
10 end 


Algorithm 1: A general framework for solving packing-covering SDPs 


We obtain different algorithms depending on how the weights are defined. We 
write a; := А; e X > 0. Since ag = maxíaj, ..., Am} (resp, amin := min(aj, ..., am}) 
is not a smooth function (in X), it is more convenient to work with a smooth approx- 
imation of it, which is provided by the weighted average formed in step 3 in the 
framework. There are several ways to do this, for example: 


пеј Fa ы; ЖЕ 
Xm Qd (resp., p; := YT 0-9 ). 
t 


The following claim justifies the use of these sets of weights. 


e Exponential averaging: The weights are p; := 


1+є т 1 т:аһа 
Lemma 4.1 [fdmax = logi, 2 (resp. Amin = = log т. (2m )), then 


m 


m 
dmax = — 
1 dg < у, Dii < атах (resp. Amin < у, Dii < (1 + аһа). 
і=1 і=1 
e Logarithmic potential averaging: The weights аге р, = = е (resp., p; = 


- x where 0* is the minimizer (resp., maximizer) of the potential function 


m а; 
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т 1 т 
Ф(0) = |0. ||] . (resp. =In{o- "|| [t - 6 ) 
=a 
1=1 


i=l 


(It can be easily verified that $^; p; = 1.) The following claim justifies the use of 
these sets of weights. 


Lemma 4.2 
(1 = €)dmax m = T — gin (1 + €) 
———— € jdi € ax ^ min < iGi < TNT ni 
1—є/т Ce ыл Norte 2 dd 1+ є/т 


4.4 Scalar Algorithms 


4.4.1 Scalar MWU Algorithm for 
(PACKING-I)-( COVERING-I) 


Given a normalized pair of packing-covering SDPs of type I (PACKING- I)- 
(COVERING- I), and a feasible primal solution X, we use the exponential weight 
pi := (1 + €)4**, fori є [m]. Averaging the inequalities with respect to the weights 


Pi := D we arrive at the following problem: 


max Гө X (4.8) 
st. УРА еХ <1, Ме [т] 


Хек", X > 0. 


Letting А := Y; p; A; and writing X = У. „св, Avvv, where B, := (v € R” : ||v|| = 
П and A, > Oforallv € B,, we obtain the following (infinite-dimensional) knapsack 
problem 


max 37A, (4.9) 
veB, 
st. So ayAevv’ <1, Wie [m] 
veB, 


dA, > 0, Vue B,. 


An optimal solution is attained at a vector v € B, which minimizes v! Av. This is 
the basis vector corresponding to Amin (A). 

Thus, using this set of weights in our general framework (Algorithm 1) yields the 
following procedure (for a vector p € IR", we write p; :— y 
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t < 0; X(0) < 0; y(0) < 0; M(0) — 0; T — e? lnm 

while M(t) < T do 

pi(t) = (1 + є)4ї*Х © /* Update the weights */ 

v(t) = агатїпү—| X; Pit Ai ө 007 /* Find an eigenvector corresponding to the 
smallest eigenvalue of the average inequality matrix */ 

5 ó(t) = 1/ max; А; e v(r)v(r)" /* Define the update step size */ 

6 | X(t- D = X(t) + 6(t)v@)v(t)?; y(t 4-1) < y(t) + 8(0)р; (0) /* Update the primal- 
dual solutions */ 

7 M (t + 1) = max; Aj e X(t + 1) /* Compute the largest LHS */ 

8 t<t+l 

9 end 


oU m 


У 5 X(t) y(t) 
10 output (X, у) = (58. aero) 


Algorithm 2: Scalar MWU algorithm for (PACKING- I)-(COVERING- I) 


The stopping criterion is that the left-hand side (LHS) of at least one inequality 
in (PACKING- I) reaches some threshold T :— €? Inm, with respect to the current 
solution X (t). The step size (step 5) is chosen such that in each iteration of the while- 
loop, this right-hand size increases by at least 1, thus guaranteeing termination in 
mT iterations. 


Theorem 4.4 Given a real є € (0, 1], Algorithm 2 outputs an €-optimal solution 
for (PACKING- I)-(COVERING- I) in О (т log т/є2) iterations, where each iteration 
requires an oracle call that computes an eigenvector corresponding to the minimum 
eigenvalue of a psd matrix. 


For a given matrix M є R"*", computing Agi (M) (almost) exactly requires 
O(n?) time via a full eigenvalue decomposition of the matrix. If M is psd, a faster 
approximation of Ат» (M) can be obtained (using Lanczos’ algorithm with a random 
start) via the following result. 


Theorem 4.5 ([19]) Let M € S' be a psd matrix with N non-zeros and y € (0, 1) 
be a given constant. Then, there is a randomized algorithm that computes, with high 
(i.e., 1 — o(1)) probability a unit vector v € В" such that v! Mv > (1 — y)Agax (M). 


The algorithm takes o( 5) iterations, each requiring O (N) arithmetic operations. 


By applying the lemma to (A), we can approximate Аъ (A) in Ó (n?) time. 


4.4.2 Scalar Logarithmic Potential Algorithm For 
(PACKING-I)-(COVERING-I) 


Given a normalized pair of packing-covering SDPs of type I (PACKING- I)- 
(COVERING- I) and a feasible primal solution X, we use the logarithmic-potential 


weights p; = = FX for i € [m]. Averaging the inequalities with respect to this 
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set of weights, we arrive at the knapsack problem (4.9). This gives rise to the follow- 
ing procedure: 


15 < 0; £o < nt < 0; v(0) < 1; X(0) < L17 (for an arbitrary i € [n]) 
2 while £, > є do 


3 
3 | ôs < = while v(t) > в, do 


4 O(t) < 0* (t)*, where 6*(t) is the smallest positive root of the equation 
E50 1 
=1 
m УЫ шу X0 
,0 (t 1 
КОРЕ Om fori e[n] /* 


m O) – A X(t)’ 
Set the dual solution */ v(t) = argmin,.., A(t) ө vv”, where 
A(t) := M, p;,(A; /* Find the eigenvector corresponding to the 
smallest eigenvalue of the average inequality matrix */ 


T E, T 

v(t + 1) « aw) n= 20) ААО /* Compute the error */ 
A(t) e X(t) + A(t) ө v(t)u(t)? 

£0 (t)v(t + 1) 
Am(A(t) e X(t) + A(t) ev(t)v(1)7) 
size */ X(t 3-1) < (1— v(t3- D)X(r +T +w /* 
Update the primal solution */ t «— t + 1 
5 | end 
6 Es+1 © 3 
7|s«—scl 


8 end 


У оу X(1—1) (1+;_1)у(7—1) 
9 output (X, y) = (a a/m (—854X10—25, 00 5) 


t(t + 1) < 


/* Compute the step 


Algorithm 3: Scalar logarithmic-potential algorithm for (PACKING- I)- 
(COVERING- I) 


In the above, for given numbers x € R, and ô € (0, 1), we define the ó-(upper) 
approximation x? of x to be a number satisfying: x < xê < (1 + ó)x. 


Theorem 4.6 Given є є (0, 1], Algorithm 3 outputs an €-optimal solution for 


(COVERING- I)-(PACKING- I) in О (m log ү + m/e?) iterations, where yy :— ECO 
and each iteration requires an oracle call that computes an eigenvector correspond- 


ing to the minimum eigenvalue of a psd matrix. 
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4.5 Matrix Algorithms 


4.5.1 Matrix MWU Algorithm For 
( COVERING-II)-( PACKING-II) 


Let F(y) := У)" у yi Ai. Then, we can rewrite the normalized version of (PACKING- 
ID) as follows: 


zi = max 17у (PACKING- II) 
s.t. A;(F(y)) € 1, Vj € [n] 
yeR", у> 0. 
Pj 


Averaging the inequalities with respect to the weights p; :— ур» where p; :— 
j Pi 


(1 + e) FO), we get 


max 17 y 


st. DU nCGQ)zi Vj eln 
j 
yeR", у> 0. 


Using the eigenvalue decomposition: F (y) = И AUT , where A is the diagonal matrix 
containing the eigenvalues of F (y) and UUT = I, and letting 


p0 -0 

Ppoy|" 59 hyr 0+0 
ee E Tr((1 + €)?) 
0 0 P, 


we obtain the following knapsack problem: 


max 17 y 


s.t. P o ADyi <1, Vj e [и] 


1 


yeR", у> 0. 


An optimal solution is attained at the basis vector y = 1; є IR" that minimizes Pe 
A;. This gives rise to the following matrix MWU algorithm: 


Theorem 4.7 Given an real € € (0, 1], Algorithm 2 outputs an €-optimal solution 
for (COVERING- II)-(PACKING- П) in O(n logn/ €?) iterations, where each iteration 
requires matrix exponential computation, two oracle calls that computes the max- 
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11 0; y(0) < 0; X(0) — 0; M(0) — 0; T < ec Inn 

2 while M(t) < T do 

з | PŒ = (1+ e) »505 у Update the weight matrix by exponentiation */ 

4 i(t) < argmin; А; e P(t) 6(t) < l/Amax(A;()) /* Define the update step size */ 
X(t+1)< X(t)4 SUAE y +1) < у(0) + (0) 1) /* Update the primal-dual 
solution */ 

5 M (t + 1) < Хх (У; yi @)А;) /* Compute the largest eigenvalue of LHS of dual */ 

6 t<t+l 

7 end 

8 

9 


L(t) «— min; Aj e X(t) 


ME: 5 ; 
output (X, 5) = (£g. #9) 


Algorithm 4: Matrix MWU algorithm for (PACKING- II)-(COVERING- П) 


imum eigenvalue of a psd matrix, and a single oracle call to the minimization in 
step 4. 


The most demanding step in the above algorithm is the matrix exponential computa- 
tion, which can be done in O(n?) time via a complete eigenvalue decomposition. A 
more efficient approximation, particularly when the matrices A; are sparse, can be 
obtained via the following result. 


Theorem 4.8 ([26]) There is an algorithm for approximating the matrix exponential 
ef in time O (r?r log? 1), where r denotes the number of non-zeros in F € S", and 
є is the approximation accuracy. 


We remark that a matrix MWU algorithm and a theorem similar to Algorithm 4 
and Theorem 4.7 for (PACKING- I)-(COVERING- I) have not yet been discovered and 
are left as open problems. 


4.5.2 Matrix Logarithmic Potential Algorithm For 
(PACKING-I)-( COVERING-I) 


Let F (y) := У)" у yi Ai. Then, we can rewrite the normalized version of (COVERING- 
р as 


z; = min 17у (PACKING- II) 
st AjFQ)zL Vj e[n] 
ує К", у> 0. 


Averaging the inequalities with respect to the weights p; := $ we get 


— | 
AjCF(y))-0** 
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min 17 y 


st. » ngo) Vj eln 
j 
ye R”, у> 0. 


Using the eigenvalue decomposition: F (y) = U AUT , where A is the diagonal matrix 
containing the eigenvalues of F (y) and UU? = I, and letting 


P0 0 

= EE 0* 

Paulo 29 boro! qty. 
л 


we obtain the following knapsack problem: 
min 1^y 


s.t. yo. Ау = Ll, Vj € [n] 


І 


yeR", у> 0. 


An optimal solution is attained at the basis vector у = 1; € IR? that maximizes 
P e Aj. This gives rise to the following matrix logarithmic-potential algorithm: 


15 «0; 50 — 431-0; 010—141 
2 while ғ; > є do 
3 
3 ôs < 3h 
4 while v(t) > £s do 
5 


Es 


Fiii Pe А Ө zi 
O(t) < 0Ө* (t). , where 6* (т) is the smallest positive root of the equation Tr(F (y(t) — ӨГ) 121 
T n 


0 
X(t) — £590 (уб) — ОЛЕ /* Set the primal solution */ i(t) < argmax; А; e X(t) /* Call 
n 


the maximization oracle */ 

6 uii (t) e Aj — XE) ө FOE) 
X(t) e Ajo) + X(t) e FOH) 
7 t(t+ 1) < nT /* Compute the step size */ 

An(X (t) e Ajit) + X(t) e F(y(0) 

yer e A= cG c D)y()  tG + Dl; /* Update the dual solution */ t «— t + 1 


/* Compute the error */ 


8 end 

9 бе = 5 
10 $ + +1 
11 end 


Aus (1—в;,_1)Х@—1)  y(r-1) 
12 output (X, 5) (ceu I-D 


Algorithm 5: Matrix logarithmic-potential algorithm for (PACKING- I)- 
(COVERING- I) 


The most demanding steps are the computation of 0 (t) and X (t) in steps 5 and 5, 
respectively. Computing 0 (f) can be done via binary search over a region determined 
by repeated matrix multiplications and approximate minimum eigenvalue computa- 
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tion (cf. Theorem 4.5). Once 0 (t) is determined, computing X (t) requires a single 
matrix inversion. The overall running time per iteration is O (n^) plus the time needed 
by the maximization oracle in step 5. 


Theorem 4.9 Given є є (0, 1], Algorithm 5 outputs an €-optimal solution for 


(COVERING- I)-(PACKING- I) in O(n log Y + 4) iterations, where yy := тни 
and each iteration requires O (log ^) matrix multiplications and a single oracle call 


to the maximization in step 5. 
4.5.3 Matrix Logarithmic Potential Algorithm For 
(PACKING-II)-( COVERING-II) 


A symmetric version of Algorithm 5 for (PACKING- II)-(COVERING- П) can be given 
as follows: 


1 s < 0; £o < Lt = 0; v(0) < 1; y(0) < 1; (for an arbitrary i € [m]) 
2 while =; > є do 
3 
3 | б, < z$ while v(t) > £s do 
4 O(t) < Ө*(ї)®% ‚ where Ө* (т) is the smallest positive root of the equation 
€50 EN! £g0 (t) =i " 
——Tr(01 — FOHA) =1 X(t) < ODI — F(y(t))) /* Set the 
n n 
primal solution */ i(t) < argmin; Aj e X(t) /* Call the minimization oracle */ 
X(t) e F(y(t)) — X(t) e Aj 
v(t 4-1) < oe Oe Аф /* Compute the error */ 
X(t) e Aja) + X(t) e FOE) 
£50 (t)v(t + 1) р 
t(t+ 1) < /* Compute the step size */ 
4п(Х (t) e Aim + X(t) e F(y(t))) 
+) «—(-crt(r-1)y()-T (ro 01 /* Update the dual solution */ 
t<t+l1 
5 end 
6 Est] + = 
7 $ + sl 
8 end 
$ 2 Qü-c&-DXG-D  y(-D0 
9 output (X, y) — (Gio, ah) 


Algorithm 6: Materix logarithmic-potential algorithm for (PACKING- ID- 
(COVERING- IT) 


Theorem 4.10 Given є є (0, 1], Algorithm 6 outputs an €-optimal solution for 
(PACKING- II)-(COVERING- II) in O(n log y + 4) iterations, where Y := O(log ©) 
and each iteration requires O(log ^) matrix inversions and a single oracle call to 
the minimization in step 4. 


62 


K. Elbassioni and K. Makino 


Acknowledgements We thank Waleed Najy for many helpful discussions on this topic. This work 
was partially supported by JST CREST JPMJCR1402 and Grants-in-Aid for Scientific Research. 
The research of the first author was partially supported by Abu Dhabi Education & Knowledge — 
Abu Dhabi Award for Research Excellence (AARE18-152). 


References 


1, 


2, 


11. 


12. 


14. 


15. 


16. 


17. 


18. 


Е. Alizadeh, Interior point methods in semidefinite programming with applications to combi- 
natorial optimization. SIAM J. Optim. 5(1), 13-51 (1995) 

Z. Allen-Zhu, Y.T. Lee, L. Orecchia, Using optimization to obtain a width-independent, parallel, 
simpler, and faster positive sdp solver, in Proceedings of the Twenty-seventh Annual ACM- 
SIAM Symposium on Discrete Algorithms, SODA '16 (Society for Industrial and Applied 
Mathematics, Philadelphia, PA, USA, 2016), pp. 1824—1831 

S. Arora, E. Hazan, S. Kale, Fast algorithms for approximate semidefinite programming using 
the multiplicative weights update method (2005), pp. 339-348 

S. Arora, E. Hazan, S. Kale, The multiplicative weights update method: a meta-algorithm and 
applications. Theory Comput. 8(1), 121—164 (2012) 

S. Arora, S. Kale, A combinatorial, primal-dual approach to semidefinite programs (2007), pp. 
221—236 

S. Arora, S. Kale. A combinatorial, primal-dual approach to semidefinite programs. J. ACM 
63(2), 12:1-12:35 (2016) 


. K. Blbassioni, K. Makino, Oracle-based primal-dual algorithms for packing and covering 


semidefinite programs, in 27th Annual European Symposium on Algorithms, ESA 2019, Septem- 
ber 9-11, 2019, Munich/Garching, Germany (2019), pp. 43:1-43:15 

D. Garber, E. Hazan, Sublinear time algorithms for approximate semidefinite programming. 
Math. Program. 158(1—2), 329-361 (2016) 

N. Garg, J. Kónemann, Faster and simpler algorithms for multicommodity flow and other 
fractional packing problems. SIAM J. Comput. 37(2), 630-652 (2007) 

M.X. Goemans, D.P. Williamson, Improved approximation algorithms for maximum cut and 
satisfiability problems using semidefinite programming. J. ACM 42(6), 1115-1145 (1995) 
M.D. Grigoriadis, L.G. Khachiyan, A sublinear-time randomized approximation algorithm for 
matrix games. Operat. Res. Lett. 18(2), 53—58 (1995) 

M.D. Grigoriadis, L.G. Khachiyan, L. Porkolab, J. Villavicencio, Approximate max-min 
resource sharing for structured concave optimization. SIAM J. Optim. 41, 1081—1091 (2001) 


. G. Iyengar, D. J. Phillips, C. Stein, Approximation algorithms for semidefinite packing prob- 


lems with applications to maxcut and graph coloring, in Integer Programming and Combi- 
natorial Optimization (IPCO), eds. by M. Jiinger, V. Kaibel (Berlin, Heidelberg, 2005), pp. 
152-166 

G. Iyengar, D.J. Phillips, C. Stein, Feasible and accurate algorithms for covering semidefinite 
programs, in Algorithm Theory—SWAT 2010, ed. by H. Kaplan (Berlin, Heidelberg, 2010), pp. 
150-162. 

G. Iyengar, D.J. Phillips, C. Stein, Approximating semidefinite packing programs. SIAM J. 
Optim. 21(1), 231—268 (2011) 

R. Jain, P. Yao, A parallel approximation algorithm for positive semidefinite programming. 
In IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, Palm 
Springs, CA, USA, October 22-25, 2011 (2011), pp. 463-471 

R. Jain, P. Yao, A parallel approximation algorithm for mixed packing and covering semidefinite 
programs. CoRR (2012). arXiv:abs/1201.6090 

P. Klein, H.-I. Lu, Efficient approximation algorithms for semidefinite programs arising from 
max cut and coloring, in Proceedings of the Twenty-eighth Annual ACM Symposium on Theory 
of Computing, STOC '96 (ACM, New York, NY, USA, 1996), pp. 338-347 


4 Oracle-Based Primal-Dual Algorithms for Packing ... 63 


19. 


20. 


21. 


22. 


23. 


24. 
25; 
26. 
27. 


28. 


29. 


Z. Leyk, Н. Woźniakowski, Estimating a largest eigenvector by lanczos and polynomial algo- 
rithms with a random start. Numer. Linear Algebra Appl. 5(3), 147-164 (1999) 

Y. Nesterov, Smooth minimization of non-smooth functions. Math. Program. 103(1), 127-152 
(2005) 

Y. Nesterov, Smoothing technique and its applications in semidefinite optimization. Math. 
Program. 110(2), 245—259 (2007) 

Y. Nesterov, A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming. 
Society for Industrial and Applied Mathematics (1994) 

R. Peng, K. Tangwongsan, Faster and simpler width-independent parallel algorithms for posi- 
tive semidefinite programming, in Proceedings of the Twenty-fourth Annual ACM Symposium 
on Parallelism in Algorithms and Architectures, SPAA '12 (ACM, New York, NY, USA, 2012), 
pp. 101—108 

R. Peng, K. Tangwongsan, P. Zhang, Faster and simpler width-independent parallel algorithms 
for positive semidefinite programming. CoRR (2016). arXiv:abs/1201.5135 

S.A. Plotkin, D.B. Shmoys, É. Tardos, Fast approximation algorithms for fractional packing 
and covering problems (1991), pp. 495-504 

J. van den Eshof, M. Hochbruck, Preconditioning lanczos approximations to the matrix expo- 
nential. SIAM J. Sci. Comput. 27(4), 1438-1457 (2006) 

L. Vandenberghe, S. Boyd, Semidefinite programming. SIAM Rev. 38(1), 49-95 (1996) 

E.P. Xing, M.I. Jordan, S.J. Russell, A.Y. Ng, Distance metric learning with application to 
clustering with side-information, in Advances in Neural Information Processing Systems 15, 
ed. by S. Becker, S. Thrun, K. Obermayer (MIT Press, Cambridge, 2003), pp. 521-528 

N.E. Young, Sequential and parallel algorithms for mixed packing and covering (2001), pp. 
38-546 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter’s Creative 


Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 5 A) 
Almost Linear Time Algorithms for Some gzs 
Problems on Dynamic Flow Networks 


Yuya Higashikawa, Naoki Katoh, and Junichi Teruyama 


Abstract Motivated by evacuation planning, several problems regarding dynamic 
flow networks have been studied in recent years. A dynamic flow network consists of 
an undirected graph with positive edge lengths, positive edge capacities, and positive 
vertex weights. The road network in an area can be treated as a graph where the edge 
lengths are the distances along the roads and the vertex weights are the number of 
people at each site. An edge capacity limits the number of people that can enter the 
edge per unit time. In a dynamic flow network, when particular points on edges or 
vertices called sinks are given, all of the people are required to evacuate from the 
vertices to the sinks as quickly as possible. This chapter gives an overview of two 
of our recent results on the problem of locating multiple sinks in a dynamic flow 
path network such that the max/sum of evacuation times for all the people to sinks 
is minimized, and we focus on techniques that enable the problems to be solved in 
almost linear time. 


5.1 Introduction 


Recently, many parts of the world have been affected by disasters including earth- 
quakes, nuclear plant accidents, volcanic eruptions, and flooding, highlighting the 
urgent need for orderly evacuation planning. One powerful tool for evacuation plan- 
ning is the dynamic flow model introduced by Ford and Fulkerson [10], which repre- 
sents movement of commodities over time in a network. In this model, we are given a 
graph with source vertices and sink vertices. Each source vertex is associated with a 
positive weight, called a supply; each sink vertex is associated with a positive weight, 
called a demand; and each edge is associated with a positive length and capacity. 
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Table 5.1 Summary of minmax k-sink problems 


Path General capacities: O(n logn + k? logt n), O(n log? n) [7] 
Uniform capacity: O(n + k? log? n), O(nlogn) [7] 
Tree General capacities: O(max{k, log n) - kn log? n) [9] 


Uniform capacity: O(max{k, log n] - kn log? n) [9] 
General graph General capacities: FPTAS for a fixed К [3] 
Uniform capacity: FPTAS for a fixed k [3] 


An edge capacity limits the amount of supply that can enter the edge per unit time. 
One variant of the dynamic flow problem is the quickest transshipment problem, in 
which the objective is to send exactly the right amount of supply out of sources into 
sinks while satisfying demand constraints in the minimum overall time. Hoppe and 
Tardos [17] provided a polynomial time algorithm for this problem in the case where 
the transit times are integral. However, the complexity of their algorithm is very high. 
Finding a practical polynomial time solution to this is still an open problem. Readers 
are referred to a recent survey by Skutella [20] on dynamic flows. 

This chapter discusses related problems called k-sink problems [3, 5—9, 14—16, 
18], in which the objective is to find the locations of k sinks in a given dynamic 
flow network so that all the supply is sent to the sinks as quickly as possible. The 
following two criteria can be naturally considered for determining the optimality of 
the locations: minimization of evacuation completion time and aggregate evacuation 
time (i.e., average evacuation time). We call the k-sink problem that requires finding 
the locations of k sinks that minimize the evacuation completion time (resp., the 
aggregate evacuation time) the minmax (resp., minsum) k-sink problem. Although 
several papers have studied minmax k-sink problems in dynamic flow networks [3, 
7—9, 14, 15, 18], minsum k-sink problems in dynamic flow networks have not been 
studied except for the case of path networks [5, 6, 15, 16].! Tables5.1 and 5.2 
summarize the previous results for the minmax k-sink problems and the minsum 
k-sink problems, respectively. 

There are two models for the evacuation method. Under the confluent flow model, 
all the supply leaving a vertex must evacuate to the same sink through the same 
edges, and under the non-confluent flow model, there is no such restriction. To our 
knowledge, almost all of the papers that deal with the k-sink problems [3, 5—9, 15] 
adopt the confluent flow model, while only one paper [16] handles both of the models. 

Although it may seem natural to model the evacuation behavior of people by 
treating each supply as a discrete quantity as in [17, 18], almost all of the previous 
papers on sink problems [3, 7—9, 14—16] have treated each supply as a continuous 


! Note that the minsum 1-sink problem in general networks can be solved in polynomial time 
by applying the following two facts: (1) Baumann and Skutella [2] provided a polynomial time 
algorithm for the problem of computing a dynamic flow to a fixed sink in a general network while 
minimizing the aggregate evacuation time. (2) For the minsum 1-sink problem in general networks, 
one can prove that there exists an optimal sink located at a vertex in a similar manner to the 
well-known node optimality theorem for the 1-median problem [12]. 
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Table 5.2. Summary of minsum k-sink problems 


Path General capacities: O(kn log? n) [6] 
min(O (kn log? п), n29 (vlog log log”) Jog} n} [16] 


Uniform capacity: O (kn log? n) [5] 
min(O (kn log? п), n29 (VlogEloglog 0 Jog? n} [16] 


Tree Open 


General graph 


quantity since it is easier to treat the problems mathematically and the effect is 
negligible when the number of people is large. Throughout this chapter, we adopt 
the model with continuous supplies. 

We also give an overview of two of our recent results [7, 16] on the problems 
of locating multiple sinks on dynamic flow path networks such that the max/sum of 
evacuation times for all the people to sinks is minimized, and we focus on algorithmic 
frameworks that enable solving the problems in almost linear time. 


5.2 Preliminaries 


For two real values a, b with a < b, let [a,b] -(reR|axt x b), [a, D) = 
(teRJ|azt«b).(a,b]-(reRJja«tzb)and (a,b)={teR|a<t< 
b), where R is the set of real values. For two integers i, j withi < j, let[i..j] = {h € 
Z\i<h < j}, where Zis the set of integers. A dynamic flow path network P is given 
as a 5-tuple (P, w, c, 1, т), where P is a path with vertex set V = (v; | i € [1..n]) and 
edge set E = (ej = (vi, vii) | i € [1..n — 1], wis a vector (wi, ..., wn) of which 
each component w; is the weight of vertex v; representing the amount of supply (e.g., 
the number of evacuees or cars) located at vj, c is a vector (c1, ..., c4 1) of which 
each component c; is the capacity of edge e; representing the upper bound on the 
flow amount that can enter e; per unit time, lis a vector (£1, ..., £, 1) of which each 
component £; is the length of edge е; (i.e., the distance between two end vertices of 
ei), and т is the time taken for unit supply to move unit distance along any edge. 
We say that a point p lies on path P = (V, E), denoted by p € P, if p lies on 
a vertex v € V or an edge e € E. We assume that path P can be represented by a 
horizontal line segment along which the vertices v1, v2, . . . , v, are arranged in order 
from left to right. For two points p,q € P, p < q means that p lies to the left side of q. 
For two points p,q € P, p < q means that p < q or p and q lie at the same location. 
For two points p,q € P such that p < q, p divides an edge (v;, vi+1) in the ratio 
гр: | — rp, and q divides an edge (vj, vj+1) in the ratio rg : 1 — rg, let L(p, q) be the 
distance between p and q, that is, L(p, 4) = (1 — rp); +6; + Усы £y (where 
рэ РР £y = 0 and 3 £j = — £i). Let us consider two integers i, j € [1..n] 
withi < j. We denote by P; ; a subpath of P from v; to vj, and by P; ; a subnetwork 
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of P consisting of subpaths Р; j. Let L; ; be the distance between v; and v;, that is, 
Lij = o £n, and let С; j be the minimum capacity among all the edges between 
v; and v;, thatis,C;,; = min{c, | h € [i..j — 1]}. Fori € [1..n], we denote the sum of 
weights from v, to v; by W; = m w;.Notethat, given a dynamic flow path network 
P, if we construct two lists of W; and д}; for all є [1..n] in O(n) preprocessing 
time, we can obtain W; for any i € [1..n] and L;,; = [у — Гл forany i, j € [1..n] 
with i < j in O(1) time. In addition, C; ; for any i, j € [1..n] with i < j can be 
obtained in O(1) time with O (n) preprocessing time, which is known as the range 
minimum query [1, 4]. 

A k-sink x is a k-tuple (x1, . . . , хх) of points on P such that x; < x; forany i < j. 
We assume that no two sinks lie on the same edge.? We define the function Id for 
point p € P as follows: the value Id(p) is an integer such that умар) < p < Via(p)+1 
holds, that is, if p lies on edge (v;, уф) or at vertex v;, Id(p) = i. A divider d is 
a (К — D-tuple (di, ..., 4:1) of real values such that 0 < d; < d; < №, for any 
i < j. A pair (x, d) is called valid if and only if Wig; < d; < Иа) holds for 
any i. A valid pair (x, d) determines what amount of supply from which vertex flows 
to which sink so that the portion d; — dj_, of supply is assigned to flow to sink x;, 
where dy = 0 and d = W,. More precisely, given a valid pair (x, d), the portion 
Wiacx;) — 4-1 of supply that originates from the left side of x; flows to sink x;, and 
the portion d; — Иасх;) of supply that originates from the right side of x; also flows to 
sink x;. For instance, under the non-confluent flow model, if М, < d; < Wp where 
h є [1..n], the portion d; — W;..; of the w, supply at v; flows to sink x; and the rest 
of the W, — d; supply flows to sink x;,,. The difference between the confluent flow 
model and the non-confluent flow model is that the confluent flow model requires 
that each value d; of a divider d must take a value in (Wi, ..., Wn}, whereas the 
non-confluent flow model does not. For a dynamic flow path network P and a valid 
pair (x, d), the evacuation completion time CT(P, x, d) is the time at which all the 
supply completes the evacuation. The aggregate evacuation time AT(P, x, d) is the 
sum of the evacuation completion time for all the supply. Explicit definitions of these 
are given in Sect. 5.3. 


5.3 Objective Functions 


Suppose that we are given a divider d = (di, ..., 4:1). This d implies that we have 
k 1-sink subproblems. The ith subproblem consists of a subnetwork 7 ;, such that 
the weight of v; is w; for j € [h + 1..h’ — 1], while those of v; and vy аге Wp — 4:1 
and d; — Wy_1, respectively, where W; < 4; < Wp and Wy, < dj € Wy. To 
explicitly define the evacuation completion time and the aggregate evacuation time, 


? Tt turns out that this assumption does not result in a loss of generality once the cost function is 
introduced later. If some adjacent two sinks x; and x;+1 lie on edge (vj, vj+1), that is, v; < x; < 
Xi+1 < Vj41, moving x; to v; or moving x;+1 to vj41 does not increase the cost. 
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we first consider the case of the 1-sink problem, and then extend the argument to the 
general case of the k-sink problem. 


5.3.1 Objective Functions for the 1-Sink Problem 


Given a dynamic flow path network P = (Р, w, c, l, т) with n vertices, we assign 
a unique sink to a point x, that is, X = (x) and d = (), which is the 0-tuple. We 
consider only the case where x is on an edge e; excluding its end vertices, that is, 
у; < X  Vi41, Since the case where x is on a vertex can be treated similarly. In this 
case, all the supply on the left side of x (i.e., at vı, ... , v;) flows to the right toward 
sink x, and all the supply on the right side of x (i.e., at vi рр, ..., Vn) flows to the left 
toward sink x. 

To treat this case, we introduce some new notation. Let the function 0+ (z) 
denote the time at which the first z — W; of supply on the right side of x completes 
its evacuation to sink x (where Ө* (2) = 0 for z € [0, W;]). Similarly, let 0*7 (z) 
denote the time at which the first W; — z of supply on the left side of x completes 
its evacuation to sink x (where 0% (2) = 0 for z € [W;, W,]). Higashikawa [13] 
showed that the values 0*:* (W,) and 0^ (0), which are the evacuation completion 
times for all the supply on the right and left sides of x, respectively, are given by the 
following formulae: 


W, — Wj- 
Ө (W,,) = max ш tTr:L(x,v)l|je[i t La] , and (5.1) 
ij 
"T Wj | 
0 (0) = max +T: Lv; х) [је [1..1]. (5.2) 
Cii 


Using these, the evacuation completion time CT(P, (x), О) is given by 
CT(P, (х), 0) = max [0** (W,), Ө (0)} . (5.3) 
We can generalize formulae (5.1) and (5.2) to the case of any z € [0, №, | as follows: 
6+ (z) = тах{0+ (2) | j e [i + L..n]), (5.4) 
where 0+7 (z) for j € [i + 1..n] is defined as 


А А 0 if z < Wi. , 
Qt (z) Z Ка J 1 


ETE +T. L(x, vj) ifz = Wj-1, 


(5.5) 


and 
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croco 19, 


AT(P, G), О) 


0 И ths Иал wi Wi+1 Y Wr-1 А 


Fig. 5.1 The blue (resp., red) thick half-open segments indicate the function 0*^* (z) (resp., 
0% (z)). The gray area indicates AT(P, (x), О) 

8*— (z) = max(0* ^ (z) | j e [1..i]), (5.6) 
where 0?-—/ (z) is defined for j є [1..i] as 


Wi-z 


Qed = } Cii 
(z) | 0 


Tt-:L(vj, x) ig e My (5.7) 
ifz > Wj. 


Then, the aggregate evacuation times for the supply on the right and left sides of x 
are 


Wn Wi 
/ 0**(z)dz and Í Ө“ (z)dz, 
w; 0 
respectively. Thus, the aggregate evacuation time AT(P, (x), ()) is given by 
Wi W, 
AT(P, (х), 00 = f 0% (z)dz «f 0*t (dz. (5.8) 
0 Wi 


See also Fig. 5.1. 


5.3.2 Objective Functions for k-Sink 


Let us consider a valid pair consisting of a k-sink x = (x1, ..., xy) and a divider 
d = (dj, ..., di.) such that each sink is on an edge excluding its end vertices, that 
18, Via(x;) < Xi < Viacx,)+1- In this situation, for each i € [1..K], the first d; — Масх) 
of supply on the right side of x; and the first Wig; — @—1 of supply on the left 
side of x; move to sink x;. By the argument of the previous section, the evacuation 
completion times for the supply on the right and left sides of x; are represented by 


5 Almost Linear Time Algorithms for Some Problems on Dynamic Flow Networks 71 
0*^* (dj) and Ө (dj-1), 
respectively. Thus, the evacuation completion time CT(P, x, d) is given by 
CT(P, x, d) = max {o** (dj), 0" (41) |i € [1..k]] Р (5.9) 


where do = 0 and d; = №,. The aggregate evacuation times for the supply on the 
right and left sides of x; are 


di Wiacs;) 
| 0"^*(z)dz and | 0“ (z)dz, 
Wi; 


di 


respectively. Thus, the aggregate evacuation time AT(P, x, d) is given by 


Иа) di 
AT(P,x, d У (/ zodz | 
d 


ie[1..k] 1—1 Wua 


mods] Р (5.10) 


where dp = 0 and dg = W,. 


5.4 Minmax k-Sink Problems on Paths 


In this section, we consider the minmax k-sink problems on path networks under the 
confluent flow model, which is precisely defined as 


(MINMAX-k-SINK-PATH-CONFLUENT-FLOW) 
Input: A dynamic flow path network P = (P, w, c,l, т). 
Goal: Find a solution (x, d) to the problem 


min. СТОР, x, d) 


st х= (х1,...,х) ЄР, xy <x, Vh <l, 
d = (di, ..., dy.) € {Wn |h € П.п], ао) € dn € ахь) YR- 


For the MINMAX-k-SINK-PATH-CONFLUENT-FLOW problem, [7] reported the fol- 
lowing result, which is the best so far: 


Theorem 5.1 ({7]) The MINMAX-k-SINK-PATH-CONFLUENT-FLOW problem сап Бе 
solved in O(min{n log п + k? log? n, n log? n}) time. Moreover, if the capacities of P 
are uniform, the MINMAX-k-SINK-PATH-CONFLUENT-FLOW problem can be solved 
in O(min{n + X? log? n, n log n}) time. 


Theorem 5.1 implies that the problem is solved in almost linear time for any k. In 
[7], two kinds of algorithms are provided: One is an O(n logn + k? log^ n) time 
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algorithm based on the parametric search method, and the other is an O(n log? n) 
time algorithm based on the sorted matrix method. 

Both algorithms require repeatedly solving the problems of locating 1-sink for 
multiple choices of different subnetworks. Note that the optimal solution for the prob- 
lem of locating 1-sink on Pj ; is a point x* that minimizes the following expression 
over x € Р} 


СТОР, у, (x), 0) = max (0** (W;), Ө? (Wi_1)} . (5.11) 


Both algorithms also require repeatedly performing feasibility tests for multiple 
choices of different subnetworks. We say that P; ; is (t, q)-feasible if and only if 
the answer of the following decision problem is “yes”: 


(FEASIBILITY-TEST-FOR-SUBPATH) 

Input: A dynamic flow path network P = (Р, w, c, l, т), a positive real г є Rt, 
integers q, i, j satisfying q € [1..k] and i, j € [1..n] with i < j. 

Goal: Determine whether there exists a pair of vectors (x', d’) such that 


CT(P;;,x, d) <t, 


x = (х\,...,җ) € Pi, 


4 = (di,...,dg1) € {Wa | h € [i jh, Wiaan) € dn < Мас) Vh. 


xy x xy Yh <l, 


Note that [7] developed a data structure called the CUE tree to efficiently compute 
0*-(W;.1) and 6* * (W;) for any integers i, j € [1..n] withi < j and any x € P; j. 
For the case of general edge capacities, the CUE tree can be constructed in O(n log n) 
time, and 0° (W; 4) and 6+ (W;) can be computed in О (log? n) time by using the 
CUE tree. See [7] for more detail. 


Lemma 5.1 (/7]) Given a dynamic flow path network P = (P, ж, с,1, т) with n 
vertices, the CUE tree can be constructed in O(n log n) time. Moreover, if the capac- 
ities of P are uniform, the CUE tree can be constructed in O (n) time. 


Lemma 5.2 ([7]) Given a dynamic flow path network P = (Р, ж, с,1, т) with n 
vertices, suppose that the CUE tree is available. Then, for any integers i, j € [1..n] 
withi < j and any x € Pi,j,0* (Wi-1) and60* * (Wj) can be computed in O (log n) 
time. Moreover, if the capacities of P are uniform, 0^ (W; 1) and 6**(W;) can be 
computed in O (log n) time. 


In the rest of this section, we first describe how feasibility tests are performed 
in Sect. 5.4.1 and how the 1-sink problem for a subnetwork is solved in Sect. 5.4.2, 
and then show the frameworks of the parametric search method in Sect. 5.4.3 and the 
sorted matrix method in Sect. 5.4.4. 
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5.4.1 Feasibility Test 


In [7], to solve the MINMAX-k-SINK-PATH-CONFLUENT-FLOW problem, an algorithm 
repeatedly tests the (7, q)-feasibility of P;, ; for multiple choices of different 4-tuples 
(t, q, i, j). Let CTopr(q,i, j) denote the optimal cost for the problem of locating 
q-sink on P; ;. Then, for a positive real t € R*, integers q, i, j satisfying q € [1..k] 
and i, j € [1..n] with i < j, Pi j is (t, q)-feasible if and only if CTopr(q.i, j) €t 
holds. 


Lemma 5.3 ([7]) Given a dynamic flow path network P = (P, ж, с,1, т) with n 
vertices, suppose that the CUE tree is available. For integers q,i, j satisfying 
q € [1..k] and i, j € [1..n] with i < j, the (t, q)-feasibility of P;,; can be tested 
in O (min(n log? n, k log? n]) time. Moreover, if the capacities of P are uniform, the 
(t, q)-feasibility of P; ; can be tested in O (min(n, k logn]) time. 


Proof We prove only the case of general capacities. For the case of uniform capacity, 
see [7]. 

To determine the (t, g)-feasibility of P; ;, we first place the sinks consecutively 
from left to right as far to the right as possible. We then compute the maximum 
integer h such that 0"^^ (W;_1) < t and 0" (W;. 4) > t holds. Next, we solve 


gr (W;_1) —a-th,=t (5.12) 


for a. If a < 1, we move the leftmost sink x, to the point that divides edge e; = 
(Vn, Vai) at a ratio of 1 — о : a, otherwise we place x, at ул. We then compute the 
maximum integer lı such that Ө* ^ (W;,) < t and 0*^* (Wi, 44) > t holds. We thus 
determine the maximal subnetwork P; ;, such that CTopr(1, i, /1) < t. In the same 
manner, we repeatedly isolate the maximal subnetworks Р; ,, Pi, i 5. Paus sss 
and if the qth subnetwork is found to have /, < j, then Л; у is not (t, q)-feasible, 
otherwise it is (f, q)-feasible. 

Let us now look at the time complexity. Isolating P; ;, consists of (a) computing 
h, (b) solving the equation for o, and (c) computing /;. Obviously (b) takes O(1) 
time. For (a), applying a binary search takes O (log? 1) time because we compute 
Ө (W;_1) overa є [i..j] O(logn) times and each 0" (W;—1) can be computed in 
O (log? n) time using the CUE tree by Lemma 5.2. Similarly (c) takes O (log? n) time 
by binary search. In this way, we can isolate at most q subnetworks іп О (q log? п) = 
O(k log? n) time. However, if we simply scan from left to right instead of using 
a binary search for (a) and (с), that is, if we compute 0"^ (W;_,) for a = i, i + 
1,...,А, А + 1 and6"" (№) forb =h+1,h+2,..., li,li + 1, ittakes О((1 — 
i) log? n) time to determine Р; 1. In this way, we can isolate at most р subnetworks 
in O((j — i)log? n) = O(n log’ n) time. 
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5.4.2 Solving the 1-Sink Problem 


Lemma 5.4 ([7]) Given a dynamic flow path network P = (P,w,c,l, t) with n 
vertices, suppose that the CUE tree is available. For any integers i, j satisfying i, j € 
[1..n] with i < j, СТорт(1, i, j) can be computed in O (log? n) time. Moreover, if 
the capacities of P are uniform, СТорт(1, i, j) can be computed in O (log n) time. 


Proof We prove only the case of general capacities. See [7] for the case of uniform 
capacity. 
Recalling Eq. (5.11), we have 


СТорт(1, i, j) = min CT(P;,;, (х), 0) 


= min max {6° * (W;), ӨС (Wi-1)} (5.13) 


xeP;j 


Because Ө* (W;) and 0" (W;—1) are monotonically decreasing and monotonically 
increasing, respectively, in x € Р; j, if an integer Л є [i..j] satisfies 0^ (W; 4) < 
0"-* (W;) and 0'«—(W; 4) > 0'"«* (W;), then there exists x* that minimizes 
СТР, у, (х), O) on edge e; including уһ and v;,;. We can apply binary search 
to compute this л, which can be done in О (log? п) time using the CUE tree (see 
Lemma 5.2). Once h is determined, x* can be computed as follows: We solve 


9" (Wi_1) — a: c£ = 0"* (W;) — (1 2a) + t£; (5.14) 


for а in O(1) time. If a <0, let x* = уру and compute СТорт(1, i, j) = 
СТР; j, Vat), О). If «> 1, let x* 2v, and compute СТорт(1, i, j) = 
CT(Pi,;, (vn), О). Otherwise, let x* be the point that divides edge e; = (уһ, улт) 
at a ratio of 1 — а: а and compute СТорт(1, i, j) = 6°" (№; j) – а: t£, = 
0"-* (W;) — (1 — a) - тё. Using ће CUE tree, we can compute these values in 
O(log” n) time. Thus, CTopr(1, i, j) can be computed in O(log? n)+O(1)+ 
O (log? n) = O(log? n) time. 


5.4.3 Parametric Search Method 


In the parametric search method, we first compute the maximum integer i; such 
that P;,+1,, is not (СТорт(1, 1, i1), k — 1)-feasible and store t; = СТорт(1, 1, i4 + 
1) as a feasible value. Note that t* = CTopt(k, 1, n) satisfies СТорт(1, 1, i1) < 
t* < tj. To compute і, we apply binary search by executing О (logn) tests for 
(СТорт(1, 1, a), k — 1)-feasibility of Pa+ı,n over 1 < a < n. For an integer a, we 
can compute СТорт(1, 1, a) in О (log? n) time by Lemma 5.4. Also, by Lemma 5.3, 
we can test whether Pa+1,n is (СТорт(1, 1, a), k — 1)-feasible in O(k log? n) time. 
Summarizing these arguments, we can compute i, and t; in { O(log? n) + O(k log? n)} 
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x O(logn) = O (klog n) time. Next, we compute the maximum integer i» such that 
P414 is not (CTopr(l, i; + 1, i2), k — 2)-feasible and store t = СТорт(1, i1 + 
1, i; + 1) asa feasible value, which can be done in O (k log n) time in the same man- 
ner as in the computation of (i1, tj). Sequentially, we determine (їз, t3), ..., (1-1, 
fj 1) in (k — 3) x O(k log n) time and eventually compute tg = СТорт(1, iy. + 
1,n)in O (log? n) time. Note that t* = min{t; | i = 1,2, ..., k} holds, which can be 
computed in O (k) time. We then execute a (t*, k)-feasibility test for P in O (k log? n) 
time, so that the optimal k-sink is obtained. We thus see that the problem can be solved 
in (k — 1) x O(klog* n) + O(log? n) + O(k) + O(klog? n) = O(k? log*n) time 
once the CUE tree is constructed. Since it takes O(n logn) time to construct the 
CUE tree by Lemma 5.1, the total time complexity is O(n log n + k? log’ n). 

For the case of uniform capacity, the same argument holds. Applying 
Lemmas 5.1, 5.3, and 5.4, we have a total time complexity of O(n + k? log? n). 


5.4.4 Sorted Matrix Method 


A matrix A is sorted if and only if each row and column of A is sorted in non- 
decreasing order. The sorted matrix method is based on the following lemma shown 
in [11]: 


Lemma 5.5 (/11]) Consider a minimization problem О with an instance T of size n. 
Suppose that the feasibility of any value for 7. can be tested in g(n) time. Let A be an 
n x n sorted matrix such that each element can be computed in f (n) time. Then, the 
minimum element of A that is feasible for Q can be found in O (nf (n) + g(n)log n) 
time. 


In [7], an n x n matrix A is defined such that the (i, j)th entry of A is given by 


x СТорт(1,п —i -1,j) ifn-i+1l<j 
Al d= | 0 otherwise. oie 
Note that we do not actually compute all the elements of A, but compute the element 
A[i, j] on demand as needed. 

Let us confirm that matrix A is sorted. It is also clear that matrix A includes 
СТорт(1, L, г) for every pair of integers (/, r) such that /, r € [1..n] with < r. In 
addition, there exists a pair (l, r) such that CTopr(1, L, r) = СТорт(к, 1, n). These 
facts imply that the minimum element A[i, j] such that P is (A[i, j], k)-feasible is 
СТорт (К, 1, п), and hence we can apply Lemma 5.5 to solve the MINMAX- k- SINK- 
PATH- CONFLUENT- FLOW problem as follows: Once the CUE tree is constructed, 
we have f (n) = O(log? п) by Lemma 5.4 and g(n) = O(n log” n) by Lemma 5.3, 
so the problem can be solved in O (n log? n) time. Because it takes O(n log п) time 
to construct ће CUE tree by Lemma 5.1, the total time complexity is O(n log n) + 
O (nlog? n) = O(nlog? n). 
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For the case of uniform capacity, the same argument holds. Applying 
Lemmas 5.1, 5.3, and 5.4, we have a total time complexity of O(n log n). 


5.5 Minsum k-Sink Problems on Paths 


In this section, our task is to find a valid pair (x, d) that minimizes the aggregate 
evacuation time AT(P, x, d). This task can be precisely represented as follows: 


(MINSUM-K-SINK-PATH) 
Input: A dynamic flow path network P = (P, w, c,l, т). 
Goal: Find a solution (x, d) to the problem 


min. AT(P, x, d) 
s.t. Xx -— 3 € PÉ, xy «x xi Nh < I, 
d= (di, ..., dg.) e RF, Wiaan € dg € Мао) Vh. 


(MINSUM-k-SINK-PATH-CONFLUENT-FLOW) 
Input: A dynamic flow path network P = (P, w, c,l, т). 
Goal: Find a solution (x, d) to the problem 


min. AT(P, x, d) 
s.t. x = (xi,..., Xk) € PF, хь < xi Vh <l, 
а = (di... di) € {Wh | h € [lon] 1, Wig) € dn < ао) Yh- 


For the MINSUM-k-SINK-PATH problem, [16] reported the following result, which 
is the best so far: 


Theorem 5.2 (/16]) The MINSUM-K-SINK-PATH/MINSUM-K-SINK-PATH- 
CONFLUENT-FLOW problems сап be solved in min( О (kn log? n), n29 (log k log log n) 
log? п} time. Moreover, if the capacities of P are uniform, then both the problems 
can be solved in min( (kn log? n), n29 (log loglog n) log? n} time. 


For the confluent flow model, it was shown in [6, 15] that for the minsum k-sink 
problems, there exists an optimal k-sink such that all of the k sinks are at vertices. 
[16] extended this fact to the non-confluent flow model. 


Lemma 5.6 (/6, 15, 16]) For the minsum k-sink problem in a dynamic flow path 
network, there exists an optimal k-sink such that all of the k sinks are at vertices 
under the confluent/non-confluent flow model. 


Lemma 5.6 implies that it is sufficient to consider only the case where every sink is 
at a vertex. Thus, we suppose X = (x1, ..., Xx) € ҮК, where x; < xj fori « j. 


5 Almost Linear Time Algorithms for Some Problems on Dynamic Flow Networks 77 


The fundamental idea of [16] for solving the MINSUM-k-SINK-PATH problem is to 
reduce it to the minimum k-link path problem. In the minimum k-link path problem, 
we are given a weighted complete directed acyclic graph (DAG) G = (V’, E’, w^) 
with V’ = {vi | i € [1..n]} and Е = [Gi у) |i, j € [l..n],i < j}. Each edge 
(v;, v;) is associated with a weight w’(i, j). A k-link path is a path that contains 
exactly k edges. The task is to find a k-link path from у; to v’, that minimizes the sum 
of weights of k edges. The minimum k-link path problem is represented as follows: 


(MINIMUM-k-LINK-PATH) 
Input: A weighted complete DAG С = (V’, E', w’). 


. 1 1 / p / / / / / ne / / / 
Goal: Find а k-link path (vj, = v4, Vap Vans +> Уш ү, Ya, = Vn) from vj to v,- 
k 
min. у, w' (aj—1, aj) 
i-l 
s.t. aj € [0..n], ао = lay = п, an «a, Nh < £. 


Schieber [19] showed that the MINIMUM-k-LINK-PATH can be solved in almost 
linear time? regardless of k if the weight function w’ satisfies the concave Monge 
property. 

Definition 5.1 (Concave Monge property) We say that a function f : Z x Z—R 
satisfies the concave Monge property if for any integers i, j міі + 1 < j, f (i, j) + 
++) = С++ f(i, j+ 1) holds. 


Lemma 5.7 ([19]) Given a weighted complete DAG with п vertices, if the weight 
function satisfies the concave Monge property, the MINIMUM-k-LINK-PATH can be 
solved in min(O (kn), n20\WPsk sks”) time, 


Higashikawa et al. [16] presented a reduction from MINSUM-k-SINK-PATH to 
MINIMUM-(k + 1)-LINK-PATH such that the weight function w’ satisfies the concave 
Monge property. Let a dynamic flow path network P = (Р = (V, Е), w, c, l, т) with 
n vertices be an instance of MINSUM-k-SINK-PATH. We prepare a weighted complete 
DAG G = (V', E', и) with n+ 2 vertices, where V’ = (vi | i € [0..n + 1]} and 
E' = ((vj, vi) |i, j є [0..0 + 1],i < j}. We set the weight function у as 


ATopt(i, j) i, j €[l.n],i < j, 
s SL] ATO i), 0) Pe Dan] and j 2 n 4 1, 
di rm AT(P, у, (vj), 0) i =Oand j є [1.5], 
оо i = Оапа ј = п + 1, 


(5.16) 


where AToprt(i, j) is the optimal aggregate evacuation time required to move all 
the supply between v; and v; to one of two sinks v; or уу. On the weighted com- 
plete DAG С constructed as above, let us consider а (k + 1)-link path (v, „= 


3 Note that we assume that the weight function w’ is not given explicitly, but that a value w’ (i, j) 
can be obtained in constant time whenever required. 
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Vor Уш,» Vap Уд = Ул) from vo to v,,,, where а],..., ag are integers sat- 
isfying 0 < а < a2 <--- < ak < n + 1. The sum of weights of this (k + 1)-link 
path is 


k k-1 
у wa, ал) = ATOPLa (уа). О) + p ATopr (ai. афт) + АТ(Ра, п, Vap), О). 
i=0 i=l 
This value is equivalent to ming AT (P, x, d) for a k-sink x = (va, Va, ... Vap)» 


which implies that a minimum (k + 1)-link path on G corresponds to an optimal 
k-sink location for a dynamic flow path network P. 
Let us consider the following subtasks: 


(MINSUM-FLOW-FOR-SUBPATH) 

Input: A dynamic flow path network P = (P, w, c, l, t), integers i, j € [1..n] with 
i<j. 

Goal: Find a value d such that 


min. АТОР; j, x = (vi, vj), d = (d)) 
s.t. W,<d< Wj-1. 


(MINSUM-FLOW-FOR-SUBPATH-CONFLUENT-FLOW) 

Input: A dynamic flow path network P = (P, w, c, l, т), integers i, j € [1..n] with 
i<j. 

Goal: Find a value d such that 


min. AT(P;,;,X = (v, vj), d = (d)) 
s.t. d € (W, | h € [i..j — 1. 


Note that [16] developed a data structure to efficiently solve both of these prob- 
lems. This data structure can be constructed in O(n log? n) time and can be used to 
solve MINSUM-FLOW-FOR-SUBPATH/MINSUM-FLOW-FOR-SUBPATH-CONFLUENT- 
FLOW in O(log? n) time. See [16] for details. 


Lemma 5.8 ([16]) For a given dynamic flow path network P with n vertices, there 
exists a segment tree T that satisfies the following conditions: 


1. 7 can be constructed in O(n log? n) time. 

2. MINSUM-FLOW-FOR-SUBPATH/MINSUM-FLOW-FOR-SUBPATH-CONFLUENT- 
FLOW can be solved in O(log? n) time by using 7 . 

3. If the capacities of P are uniform, then MINSUM-FLOW-FOR-SUBPATH/MINSUM- 
FLOW-FOR-SUBPATH-CONFLUENT-FLOW can be solved in O(log? п) time by 
using 7. 


Because w' satisfies the concave Monge property (see Sect. 5.5.2), Lemmas 5.7 
and 5.8 lead to Theorem 5.2. 
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In the rest of this section, we first observe the properties of the aggregate evacuation 
time in Sect.5.5.1 and then show that the weighted function w’ obtained by the 
reduction satisfies the concave Monge property in Sect. 5.5.2. 


5.5.1 Property of Aggregate Evacuation Time 


Recalling that we consider only the case where every sink is at a vertex, we simply 
use 0^* (z) and 0^- (z) instead of 0"^* (z) and 0"^* (z), respectively. 

We next give the general form of the aggregate evacuation time. Let $^* (z) denote 
the aggregate evacuation time when the first z — W; of supply on the right side of v; 
flows to sink v;. Similarly, we denote by $^- (z) the aggregate evacuation time when 
the first W;_; — z of supply on the left side of v; flows to sink v;. Therefore, we have 


ptz) = / өен (dt = / КО? апа 
i 0 


Wi 


* Wii p Wn а z * 
$=) = f 6^ (t)dt = f ө (dt = — / ét — (547) 
2 © W, 


n 


(see Fig.5.2). For i, j € [1..n] with i « j, we define 


2 Wn 
pi = pte) c e^ (- | Өт ()dt + f e^ (dr (5.18) 
0 


Z 


forz € [W;, W;-1]. 


9^* (t) 


W, Wiss з z t 


Fig. 5.2 Thick half-open segments represent the function 0^* (t) and the gray area represents 
ф(х) for some z > W; 
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Suppose that we are given a k-sink x = (x1, ..., xX) € V* and a divider d = 
(di, ..., di 1). Recalling the definition of Id(p) for p € P, we have x; = via; for 
all i € [1..k]. Because each sink is at a vertex, by simply modifying the integration 
intervals in Eq. (5.10), the aggregate evacuation time AT (P, x, d) is given by 


Wia(x;)-1 dj 
AT(P, x, d) — > | | 0*7 (z)dz + | оё) 
іє.) М” 4-1 Улас) 
Wiacx;)-1 di 
=)>), | gH? (z)dz + | o+ (dz). (5.19) 
іє.) М” 4-1 Wie; 


By Eqs. (5.17), (5.18) and (5.19), we have 


Wias;)-1 di 
AT(P,x,d)= У? | / gree (z)dz + f чош) 
di-i 


ie[1..k] асу) 


= J, (а) + фа) 


ie[1..k] 


= PEO) + у, GREG) + ФӘ (Wy). (5.20) 
ie[1..k—1] 


In the rest of this section, we show the important properties of $^ (z). Let us first 
confirm that by Eq. (5.17), both $^* (2) and $^ (z) are convex in z since 0^* (z) 
and —6/-- (z) are non-decreasing іп z, and therefore $^ (2) is convex in z. We have 
a more useful lemma that gives the conditions for the minimizer of $^ (z). 


Lemma 5.9 ([16]) For any i, j € [1..n] with i « j, there uniquely exists 


z* є argmin max(0^* (z), 0^ (z)]. 
ze[W;i,W;j-i] 


Furthermore, $^ (z) is minimized on [W;, Wi. 4] when z = z*. 


Proof By Eqs. (5.4) and (5.5), 0^* (z) is strictly increasing in z € [W;, W, ]. Simi- 
larly, by Eqs. (5.6) and (5.7), 0^ (z) is strictly decreasing in z € [0, W;.,]. Thus, 
there uniquely exists z* € [W;, W; i]. 

We then see that for any z’ € [W;, z*], 


$^ (z*) = $^ (27) = $^* (z*) ES фі (z*) АЕ ($^ (2') + $^ (z) 


2% 


- | tod- | Ө? (t)dt 


at 


- |. {o'* (t) — 61 (0)} dt < 0, 


5 Almost Linear Time Algorithms for Some Problems on Dynamic Flow Networks 81 
and for any z” € [z*, Wj-1]., 


$^! (z*) — 9^! (z^) = $^* (z*) + Фф” (z*) — GT") - 9^7 (2) 


z* E 


= ГА ЖО 0i- (tdt 


= -f {Ө (т) — 0^7 (0)} at x 0, 


which imply that z* minimizes $^! (z) on [W;, Wi]. 


In the following sections, this z* is called the pseudo-intersection point’ of 0^ (z) 
and 07-- (z). 


5.5.2 Concave Monge Property 


We now show that the function w’ defined in Eq. (5.16) satisfies the concave Monge 
property under the non-confluent flow model. We omit the proof for the confluent 
flow model, since the proof can be constructed similarly to the one for the confluent 
flow model. See [16] for details. 

Let us give some observations of ATopr(i, j). Under the non-confluent flow 
model, for any i,j €[1..n] with i< j, АТорт(ї, j) = minzaw,w, i $^ (2). 
Lemma 5.9 implies that $^/(z) on [W;, Wj-1] is minimized when z is the pseudo- 
intersection point of Ө“ (г) and 0%- (х). For any i, j € [1..n] with i < j, let aij 
denote the pseudo-intersection point of 0^ (z) and 07: (z). 

Thus, we have 


абі ү, | 
АТовт@, j) = d^ a) = f 6^ cz | 07-- (z)dz. (5.21) 
0 aid 


We give the following two lemmas. 


Lemma 5.10 (/16]) For any integer i € [1..n — 1] and any z € [0, №,], 
0^* (z) > ++ (2) and 0° (2) < gi*.-(z) 


hold. 


Proof We give the proof of only 0^* (z) > 0/*'-*(z) because the other case can be 
proved in a similar way. By Eq. (5.5), for any j € [i + 2..n], we have 


4 The reason why we use the term “pseudo-intersection” is that the two functions 0^*(z) and 
Ө?” (2) are not continuous, in general, whereas the "intersection" is usually defined for continuous 
functions. 
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0 ifz < Wj-1, 
9+ (z) — gi* tz = и: ++ if Wj- <z < Wj, 


+r- G if > W;. 


(z-Wi-1(Ci,;- Ci) 
Ci.jCizij 


Because C;,,,; — Ci; j = min{c, | h € [i + 1..j — 1]) — min{c, | h € [i..j — 1] = 
0, 05^ (z) — gi* oJ (z) > 0 holds. Therefore, we have 05+ (z) > ӨТЕ + (г) since 
9+ (z) = max{6*+4(z) | j € [i + 1..n]) by Eq. (5.4). 


Lemma 5.11 (/[/6]) For any i, j € [1..n] with i < j, 
abi < aithi < a ang abl < abit! < g ith iH 
hold. 


Proof We give the proof of only œ>} < a‘t!/ because the other cases can be proved 
in a similar way. For any i, j € [1..n] with i < j and positive constant e, we have 


Qi lt (gi —є) < Ө? (g^) =£) < Ө? (аі — €) 


because 05+ (z) > 01+ (2) holds by Lemma 5.10 and 0/- (z) is a non-increasing 
function. This implies that g} < o!*' holds, which completes the proof. 


Let us show that the function w' defined in Eq. (5.16) satisfies the concave Monge 
property under the non-confluent flow model. 


Lemma 5.12 ([16]) The weight function w' defined in Eq. (5.16) satisfies the con- 
cave Monge property under the non-confluent flow model. 


Proof If we show that, for any i, j € [0..n] with i < j, 
w'(i, j) 4- w'(i 4-1, j - 1) E w'(i, j 4- 1) -w'( 4 1, j) (5.22) 


holds, thus completing the proof. Note that condition (5.22) holds for i = 0 and 
j — n, because the right-hand side of (5.22) contains w'(0, n + 1) = oo and other 
terms are finite. Let us consider the following three cases: (1) 0 <i < j <и, (2) 
і =Oand0 < j <n,(3)0 <i «nand j =n. 

Case 1. Consider the case of 0 <i < j <n. By Eq. (5.16), for any (i^, j) € 
10,37), 0,3 +10), 0+1, Л), +1, 3+1), we have w(i’, j) = AToer(G, j^). 
Recall that o^ is the pseudo-intersection point of 0^* (2) and 0/7 (2), and we have 


ij W, 


HE j) = ATopr(i, eddy I 6^* (gdz + f "47 (z)dz (5.23) 
0 ij 


ais 


For any i, j € [1..n — 1] with i « j, Eq. (5.23) and Lemma 5.11 state that 
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w'(i, j 4- D +w 1, j w(i, j) ^w (i 4-1, j 4-1) 
= gr + l (œi у +ф! + Lig! + bj) Z AEA CHED) _ gitlitlalithitl 


qiitl qithitl 


=f o (да: | Oth (2) dz 
aid qi itl 
git gi tli 
— | 07 (z)dz — | G++ (z)dz. (5.24) 
aii gitl j 


Now, we show that for апу z є [o^ 0171), 
min(6^* (z), 6/*^-(z)) = max(0^- (z), t+ (z)) 


holds. First, for any z € [0, №,], 0^* (z) > 0/*'* (z) and 0^- (z) < 0/*'- (z) hold 
by Lemma 5.10. For any z > a!-/,6'+(z) > 0%- (2) holds because o^ is the pseudo- 
intersection point of Ө®+ (д) and 0%- (z). Similarly, for any z < o/* *!, we have 
gi* (2) < 0171 (z). Therefore, for any z € [o^ , ait! *'), min(0^* (z), 9/*.— 
(z)} > max(07-- (2), 0/*-* (z)) holds. 

Thus, Eq. (5.24) continues as 


w'(i, j 4- 1) -w'G -1, j w(i, j) —w' (i 4-1, j 4-1) 
«КЫЛ+! PENIS 


> | mine), 0/09) |. тах{0/— (z), 91+ (z)}dz 


ahd 
= 0, 


and then condition (5.22) holds for any i, j withO <i <j <и. 
Case 2. Consider the case of i = 0 and j € [1..n — 1]. Recall that w'(0, j) = 
ф7— (0) and w'(0, j + 1) = i*i- (0) by Eq. (5.16). In this case, we have 


w(0, j - D Ew, j) -wQ, ) -w'ü. 7+) 
= 9/177 (0 + o (a) - 9^ (9) — $^ (at) 


Wr abi Wr — 
i в gaz | e" (ае + f 0^ (z)dz 
0 0 ahd 


W, аі Wn 
— | Ө? (z)dz — 1 6+ (z)dz — | ӨТ (z)dz 
0 0 qi itl 


"ES Lj oli 


ePi ede — | oode- | 9^* (z)dz, 
0 o 


E 


where the last equality uses œ} < o^ /*! by Lemma 5.11. By Lemma 5.10, we have 
6/+!—(z) > 0^-(z) for any z € [0, W,,]. Using the same argument as in the previous 
case, for any z < а1/+!, we have 9^ (z) < 0/*^— (z). Thus, we have 
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и(0, 3+0) + (1, є) – м (0, j)) 2 w'(L, j +1) 


«+1 ald qi itl 
x аа) edz - | t+ (z)dz 
Ит x gli к 
= А [0+1 (z) — 67 (az | — 89) —8'* @} dz > 0. 


Lj 


Case 3. Consider the case of j = n and i € [1..n — 1]. Recall that w'(i, n + 1) = 
$^* (W,) and w'(i + 1, n + 1) = $/* * (W,) by Eq. (5.16). Similar to the second 
case, we use the facts that o^" < œit!” by Lemma 5.11, 0^* (z) > 0/*'*(z) for any 
z € [0, W,] by Lemma 5.10, and 0^* (z) > 0"-- (z) for any z > a^. Then, we have 


w'(i, n - D) 4-w'( -- 1, n) — м (і, n) — w' (i, n + 1) 
= Q^ * (W,) + o+” (aitt) — Qi" (g^) — git! * (W) 


W, | aith” ША 
f OF (z)dz + | 
0 0 


o iz | 0" (z)dz 
air W, ДИ 
-f otoa- | "оа: - | Ө (z)dz 
0 


qitln 
0 ain 


PRSE 


Wr Wr 
i otoa- | =od- | ӨТК (oz 
qin qin qitin 


qithn W, 


[etree acs [ro eoa 0. 


Thus, for any i, j € [0..n] with i < j, condition (5.22) holds. This implies that 
the function w’ satisfies the concave Monge condition. 
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Part Ш 
Sublinear Data Structures 


Chapter 6 A) 
Information Processing on Compressed get 
Data 


Yoshimasa Takabatake, Tomohiro I, and Hiroshi Sakamoto 


Abstract We survey our recent work related to information processing on com- 
pressed strings. Note that a “string” here contains any fixed-length sequence of sym- 
bols and therefore includes not only ordinary text but also a wide range of data, 
such as pixel sequences and time-series data. Over the past two decades, a variety 
of algorithms and their applications have been proposed for compressed informa- 
tion processing. In this survey, we mainly focus on two problems: recompression and 
privacy-preserving computation over compressed strings. Recompression is a frame- 
work in which algorithms transform a given compressed data into another compressed 
format without decompression. Recent studies have shown that a higher compression 
ratio can be achieved at lower cost by using an appropriate recompression algorithm 
such as preprocessing. Furthermore, various privacy-preserving computation mod- 
els have been proposed for information retrieval, similarity computation, and pattern 
mining. 


6.1 Restructuring Compressed Data 


Data compression plays a central role in the efficient transmission and storage of 
data. Recent developments have also shown that data compression is a useful tool for 
processing highly repetitive data which contains long common substrings. Typical 
examples of highly repetitive data include collections of genomes taken from simi- 
lar species and versioned documents. Popular compressors for highly repetitive data 
include Lempel-Ziv 77 (LZ77) [40], run-length encoded Burrows-Wheeler transform 
(RLBWT) [8], and grammar-based compression [34]. For each of these compression 
methods, researchers have developed techniques for operating on compressed data. 
For example, there are indexes based on LZ77 [37], RLBWT [17], and grammar- 
based compression [11]. Although recent studies [33, 36, 45] have investigated the 
fundamentals of these techniques and obtained a unified view of the compressibility 
of highly repetitive data, each compressed format still has pros and cons that cannot 
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be ignored in practice. LZ77 usually achieves better compression than other com- 
pression methods, the index based on RLBWT (called r-index) supports very fast 
pattern search, and grammar-based compression is easy to handle in both theory and 
practice. Thus, in order to take advantage of the virtues of the different compressed 
formats, it is useful to have algorithms that can efficiently convert one compressed 
format to another. In this section, we present some examples of these algorithms. 


6.1.1 Preliminaries 


Let У be an ordered alphabet, that is, a set of characters that has а total order. A 
string over У is a sequence of characters chosen from X. The length of a string w 
is denoted by |w|. For any 1 <i < |w], the ith character of w is denoted by w[i]. 
The substring of w starting at i and ending at j is denoted by w[i... j]. The substring 
w[i...j] is called a prefix (resp., suffix) if = 1 (resp., j = |w|). The reversed string 
of w is denoted by ш“, namely, w^ = w[|w|]w[|w| — 1] - - - w[2]w[1]. 

Let T be a string of length n over >]. We consider the following three compression 
schemes for Т. 

LZ77: LZ77 is characterized by greedy factorization T = f; f5--- f. of T. The 
ith factor f; is a single character if the character does not appear in f; р --- f; 1, and 
otherwise, the longest substring such that there is another occurrence s; of f; with 
Si € |fifo--- fi-il. The position s; is called the reference position of the ith LZ77 
factor f;. We can store T in O (z)-space because each factor f; (in the second case) 
can be replaced with a pair (s;, | fil). 

BWT, RLBWT: For simplicity, we assume that T is extended by the end marker $, 
which is a special character not in X and lexicographically smaller than any character 
in X, that is, T[n + 1] = $. The Burrows-Wheeler transform [8] is a permutation L 
of characters in T[1...n + 1] obtained as follows: L[i] is the character preceding 
the lexicographically ith smallest suffix among all non-empty suffixes of T with the 
exception that L[i] = $ when the ith smallest suffix is T itself (and therefore has 
no preceding character). The resulting string L can be interpreted as a sequence 
obtained by sorting characters in T according to their context (succeeding suffixes). 
Since characters sharing similar context tend to be identical, L is well compressible 
by run-length encoding. The run-length encoded BWT is called RLBWT. 

Let SA[1...1 + 1] denote the suffix array of T[1...n + 1], where SA[i] is the 
starting position of the lexicographically ith smallest suffix. We consider SA as 
a mapping from BWT position to text position and say that the BWT position i 
corresponds to the text position SA[i]. One crucial operation on the BWT string 
L is the so-called LF mapping that maps a BWT position i to the BWT position 
corresponding to text position SA[i] — 1. LF mapping can be implemented by a 
rank data structure on L that returns the number of occurrences of a character c in 
L[1...i] for any character c and BWT position i. 

By using LF mapping, we can also support backward search. For any string w 
that appears in Т, there is a unique maximal interval [b ...е] such that the lexico- 
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graphically ith suffix is prefixed by w iff i € [b...e]. Note that e — b + 1 is the 
number of occurrences of w in T and the text positions corresponding to these posi- 
tions represent the occurrences of w. A single step of the backward search computes 
the cw-interval from the w-interval by using the same mechanism as LF mapping, 
where c is а character. The index based on backward search on BWT is known as the 
FM-index [14]. Although it was previously known that the occurrences of a pattern 
can be counted by a backward search implemented in RLBWT space [41], it was 
recently reported that RLBWT can be augmented with an О (r)-space data structure 
to report all the occurrences of the pattern efficiently. The index based on RLBWT 
is called the r-index [17]. 

Grammar compression: Grammar compression is a general framework of data 
compression in which a context-free grammar (CFG) S = (X, V, D) that derives a 
single string Т is considered to be a compressed representation of Т, where & is the 
set of characters (terminals), Y is the set of variables (non-terminals), D is the set of 
deterministic production rules whose right-hand sides are strings over (Y U X), and 
the last variable derives Т.! The compressed size of S is expressed by the sum of 
the lengths of right-hand sides of the production rules in S. We consider run-length 
encoding right-hand sides of CFGs, and call such CFGs run-length encoded CFGs 
(RLCFGs). The compressed size of an RLCFG is expressed by the sum of run-length 
encoded sizes of right-hand sides of the production rules. 


Algorithm 1: Supposing that we have parsed suffix T[p + 1...], compute the 
length of the next LZ77 factor ending at p. 
1р < р; 
2 w < є; 
3 c < ТІр!]; 
4 while cw-interval contains a text position larger than р! do 
5 | р=р—1; 
6 ш < сш; 
7 с < Тр]; 
8 


return тіп(1, р! — р); 


6.1.2 RLBWT to LZ77 


Algorithms to compute LZ77 from RLBWT are considered in [3, 32, 46, 47, 
49]. An essential task when computing LZ77 is to search for the longest pre- 
fix of T[| fi fo--- fi-1il + 1---] that occurs before and compute an occurrence 
Si € |fifo--- fi-il of fi. The basic idea is to use the backward search on RLBWT of 
ТК to perform this task. One difficulty is ignoring the BWT positions that correspond 


' We treat the last variable as the starting variable. 
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to the suffixes starting after | fı f2 --- fj-1| during the backward search. In [49], it is 
shown that keeping at most 2r BWT positions is sufficient to compute the longest 
prefix and a reference position for the LZ77 factor. This subsection gives a brief 
review of this idea. 

For the sake of this explanation, consider the case of LZ77 parsing from right to 
left (i.e., we conceptually compute the LZ77 factorization for Т“) so that backward 
search on T (instead of the reversed one) can be used. Supposing that we have parsed 
suffix T[p + 1...], Algorithm 1 shows how to compute the length of the next factor 
ending at p. To check whether the cw-interval contains a text position larger than 
р’, we partition SA into r subintervals and maintain at most two positions for each 
subinterval, which is the LF-mapped interval of a run of L. Suppose that the cw- 
interval [b...e] is non-empty and [b ...e] is covered by consecutive subintervals 
[bi ... e1], [b2 ... e2], ... , [bk . . . ep] with minimal integer k, that is, bj < b < ei + 
1 = b2 < e2 + 1 = з <... < ezp-1 + 1 = bg < e < eg. If k = 1, the characters of 
L in w-interval consist of a single character c and all positions in w-interval are LF- 
mapped to cw-interval. Therefore, cw-interval contains a text position larger than р” 
iff w-interval satisfies the condition in the previous step. For the case of k > 1, we 
mark the closest positions from the boundaries of subintervals that correspond to text 
positions larger than p’. Using this information, we can check whether SA[b; ... e1] 
and/or SA[b, . . . ек] contain a text position larger than p'. We also maintain the data 
structure to check whether a subinterval іп [b2 ... e2], ..., [ркт ... eg_1] contains 
a text position larger than p', and if so we compute which interval contains that 
position. 

Inthis way, we can compute the lengths of LZ77 factors. The reference position for 
each LZ77 factor can also be computed by maintaining text positions corresponding 
to the marked positions in each subinterval. The data structures use only O (r) words 
of space. 

In [46], the data structures are tuned to improve the time complexity. In [47], a fast 
implementation for the backward search in RLBWT space was proposed and applied 
to the above-mentioned algorithm. In [3], an online construction of r-index was 
proposed and the technique was extended to an online LZ77 factorization algorithm 
in RLBWT space. In [32], a different approach to converting RLBWT to LZ77 was 
proposed. 


6.1.3 Recompression on Grammar Compression 


Given that there are a number of CFGs with different properties for representing 
strings, we may want to transform one CFG to another without explicitly decom- 
pressing the text. In this subsection, we introduce a technique called recompression 
which has proven to be a powerful tool in problems related to grammar compres- 
sion [26—28, 31] and word equations [29, 30]. 
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In [27], Jez proposed an algorithm TtoG for computing an RLCFG of T in O(N) 
time. Let TtoG(T) denote the RLCFG of T produced by TtoG. We use the term 
letters for characters and variables introduced by TtoG. A run is called a block in 
this subsection. TtoG consists of two different types of compression, namely, block 
compression (BComp) and pair compression (PComp). 


BComp: Given a string w over X = [1...|w|], BComp compresses w by replac- 
ing all blocks of length > 2 with fresh letters. Note that BComp eliminates all 
blocks of length > 2 in w. 

PComp: Given a string w over X = [1...|w|] that contains no block of length 
7 2, PComp compresses w by replacing all pairs from XX with fresh letters, 
where (X, X) is a partition of X, that is, X = X U X and X N È = Ø. Given the 
frequency table of pairs, we can deterministically compute a partition of X by 
which at least (|w| — 1)/4 occurrences of pairs are replaced. 


TtoG compresses To = T by applying BComp and PComp in turns until the string 
is shrunk down to a single letter. Because PComp compresses a given string by a 
constant factor of 3/4, the height of TtoG(T) is О (1g N). 

TtoG performs level-by-level transformation of То into strings Ti, T2, ..., Ту, 
where |7;| = 1. If ^ is even, the transformation from T; to Т, is performed by 
BComp, and production rules of the form c > & are introduced. If h is odd, the 
transformation from T; to Тут is performed by PComp, and production rules of 
the form c — Сс are introduced. Let X; be the set of letters appearing in Т. 

The advantage of TtoG is that it can be simulated on S = Sp = (Xo, V, Do) 
without decompression. We consider the level-by-level transformation of So into 
CFGs S| = (Xj, y, Dı), $ = (35, y, Т), ЖҮ, 5; = (55, y, Di). where each Sh 
generates T}. More specifically, the compression from Th to Т +1 is simulated on Sj. 
We can correctly compute the letters introduced in each level h + 1 while modifying 
S, into буу; hence, we get all the letters of TtoG(T) in the end. We note that 
new "variables" are never introduced and modifications are made by rewriting the 
right-hand sides of the original variables. 

We now show how PComp is performed on S; for odd h. That is, we compute Sp+1 
from $. Note that any occurrence i of a pair c in Т, can be uniquely associated with 
a variable X that is the label of the lowest node covering the interval [i ...i + 1] in 
the derivation tree of S; (recall that S, generates Тһ). We can compute the frequency 
table of pairs by counting pairs associated with X in D; (X) and multiplying it by 
the number of occurrences of X in the derivation tree of S}. The frequency table is 
used to compute a partition of 2, which determines the pairs to be replaced. A pair 
appears explicitly in right-hand sides or crosses the boundaries of variables. We can 
modify S, so that all the crossing occurrences to be replaced appear explicitly in 
some right-hand side, then replace the explicit occurrences to get $). In a similar 
way, BComp can also be performed on Sp} for odd л. 

In [23], it is shown that TtoG(T) can be used to answer the longest common 
extension (LCE) queries and the transformation from arbitrary CFG S to TtoG(T) 
is a key for efficient construction algorithms of LCE data structures in grammar 
compressed space. In [53], the recompression technique is modified to transform 
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arbitrary CFG S into the CFG obtained by the RePair algorithm [38]. RePair is 
known to achieve the best compression performance in practice and there are many 
studies on computing RePair in small space. Using online grammar compression 
algorithms, such as [43, 57], the algorithm in [53] leads to the first RePair algorithm 
working in compressed space. 


6.2 Privacy-Preserving Similarity Computation 


6.2.1 Related Work 


This section reviews recent results in privacy-preserving information retrieval over 
strings recently presented in [59]. As the number of strings containing personal 
information has increased, privacy-preserving computation has become increasingly 
important. Secure computation based on public-key encryption is one of the great 
accomplishments of modern cryptography because it allows untrusted parties to 
compute a function based on their private inputs, while revealing nothing but the 
result. 

Rapid progress in gene sequencing technology has expanded the range of appli- 
cations of edit distance to include personalized genomic medicine, diagnosis of 
diseases, and preventive treatment (e.g., see [1]). However, because the genome of 
a person is ultimately personal information that uniquely identifies the owner, the 
parties involved should not share personal genomic data in plaintext. We therefore 
consider a secure two-party model for edit distance computation: Two untrusted par- 
ties generating their own public and private keys have strings x and y, respectively, 
and they want to jointly compute f(x, y) for a given metric f without revealing 
anything about their individual strings. 

Homomorphic encryption (HE) is an emerging technique for such secure multi- 
party computation. HE is a kind of public-key encryption between two parties Alice 
and Bob where Bob wants to send a secret message to Alice. In this model, Bob 
generates his secret key and public key prior to communication, say sk and pk, where 
pk is known to everyone. Alice then sends the encrypted message E (m, pk) to Bob 
and he decrypts m by using his secret key sk using the property E(E(m, pk), sk) — 
m. If it is not necessary to specify the owner of pk and sk, we simply write E (m) 
for simplicity. 

A public-key encryption Е() has the additive homomorphic property if we can 
obtain E (m + n) from E(m) and E(n) without decryption, and the multiplicative 
property is similarly defined. If E() is additive, Alice can obtain the summation of 
many people's secret numbers without revealing their private numbers. 

The first public-key encryption algorithm RSA [51] is multiplicative because it 
has the following property: Let (e, п) be a public key and (d, п) be a secret key, 
respectively, where e, d, n are integers. For a message m, its encryption is computed 
by c = (m° mod n) and is decrypted by c^ = т“ =m mod n. We can easily 
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check the multiplicative property (тт mod п) · (т5 mod n) = (mjm2)° mod n. 
The Paillier encryption system [48] was the first system to have the additive property. 
This means that parties can jointly compute the encrypted value E(x + y) directly 
based on only two encrypted integers E(x) and E(y). 

By taking advantage of the homomorphic property, researchers have proposed HE- 
based privacy-preserving protocols for computing the Levenshtein distance d (x, y). 
For example, Inan et al. [25] designed a three-party protocol where two parties 
securely compute d(x, y) by enlisting the help of a reliable third party. Rane and 
Sun [50] then improved this three-party protocol to develop the first two-party pro- 
tocol. 

In this review, we focus on an extended Levenshtein distance called the edit 
distance with moves (EDM) which allows any substring to be moved with unit cost 
in addition to the standard operations of inserting, deleting, and replacing a character. 
Based on the EDM, we can find a set of approximately maximal common substrings 
appearing in two strings, which can be used to detect plagiarism in documents or long 
repeated segments in DNA sequences. As an example, consider two unambiguously 
similar strings x = a" b" and у = b™a™ , which can be transformed into each other 
by a single move. While the exact EDM is simply EDM(x, y) = 1, the Levenshtein 
distance has the undesirable value d (x, y) = 2N. The n-gram distance is preferable 
to the Levenshtein distance in this case, but it requires huge time/space complexity 
depending on N. 

Although computation of EDM(x, y) is NP-hard [55], Cormode and Muthukrish- 
nan [12] were able to find an almost linear-time approximation algorithm. Many tech- 
niques have been proposed for computing the EDM. For example, Ganczorz et al. [18] 
proposed a lightweight probabilistic algorithm. In these algorithms, each string x is 
transformed into a characteristic vector v, consisting of nonnegative integers repre- 
senting the frequencies of particular substrings of x. For two strings x and y, we then 
have the approximate distance guaranteeing L1(v., vy) = O(lg* N Ig N)EDM(x, у) 
for N = |x| + |y]. 

In Appendix A of [15], there is a subtle flaw in the ESP algorithm [12] that achieves 
this O(Ig* N lg N) bound. However, this flaw can be remedied by an alternative 
algorithm called HSP [15]. Because lg* N increases extremely slowly,” we employ 
Lı(vx, vy) as a reasonable approximation to EDM(x, y). 

Basically, the ESP tree is a special type of grammar compression referred to in the 
previous section where the length of the right-hand side of any production rule is just 
two or three. Therefore, EDM(x, y) is approximated by the compressed expressions 
for the strings x and y. The relationship between grammar compression (including 
ESP) and its applications has been widely investigated in the past two decades (see, 
e.g., [10, 21, 24, 39, 42, 52, 54, 56—58]). 


2 1g* № is the number of times the logarithm function lg can be iteratively applied to N until 
lg* N <1. 
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Recently, Nakagawa et al. proposed the first secure two-party protocol for EDM 
(SEDM) [44] based on HE. However, their algorithm suffers from a bottleneck during 
the step where the parties construct a shared labeling scheme. Yoshimoto improved 
the previous algorithm to make it easier to use in practice [59]. We review the practical 
algorithm here. 


6.2.2 Edit Distance with Moves 


Based on the notation for strings in the previous section, EDM(S, 5") is the length of 
the shortest sequence of edit operations that transforms S into S’, where the permitted 
operations (each having unit cost) are inserting, deleting, or renaming one symbol 
at any position, or moving an arbitrary substring. Unfortunately, as Theorem 6.1 
states, computing EDM(S, S") is NP-hard even if the renaming operations are not 
allowed [55], so we focus on an approximation algorithm for EDM, called Edit- 
Sensitive Parsing (ESP) [12]. 


Theorem 6.1 (Shapira and Storer [55]) Determining EDM(x, y) is NP-hard even if 
only three unit-cost operations are allowed, namely, inserting a character, deleting 
a character, and moving a substring. 


ESP constructs a parsing tree, called an ESP tree, for a given string S, where 
internal nodes are labeled consistently, that is, internal nodes have a common name if 
and only if they derive the same string. After two ESP trees 7; and Ty, are constructed 
for given strings S and S’ for comparison in EDM, the characteristic vectors vs and 
vs; are defined such that vs[i] is the frequency of the ith label in Ts. EDM(S, S") is 
then approximated by Lı (05, vs’) with the following lower/upper bounds. 


Theorem 6.2 (Cormode and Muthukrishnan [12]) Let Ts and Ts be consistently 
labeled ESP trees for S, S' € X:*, and let vs be the characteristic vector for S, where 
us[k] is the frequency of label k in Ts. Then, 


1 
-J EDMG, 5) < Lı (vs, vy) = O(lg* N lg N)EDM(S, 5) 


k 
for Li(vs, vg) = У 10510 — vs lil. 


i=l 


In Fig.6.1, we illustrate an example of consistent labeling of the trees Ts and 
Ts together with the resulting characteristic vectors. Since the strings < and < are 
parsed offline, the problem of preserving privacy is reduced to designing a secure 
protocol for creating consistent labels and computing the L,-distance between the 
trees. 
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Ts for string S = adabcadeab Ts: for string S' = eabcadadab 
8 9 
5 6 7 5 
x a ж a ж ES x a 
1 2 3 4 4 3 1 2 
bo bo" Ж; a LI i a к + a p^ a У. Ҹ bo" 
adabcadeab eabcad adab 


vs = (4,2, 1,2, 1, 1, 1,1,1, 1,1,0, 1,0) 
ус, = (4,2, 1,2, 1,1, 1, 1, 1, 1,0, 1,0, 1) 


Fig. 6.1 Example of approximate EDM. For strings $ = adabcadeab and S' = eabcadadab, S 
is transformed into S’ by two moves of substrings, that is, EDM(S, 5”) = 2. After constructing ESP 
trees Ts and Ty with consistent labeling, the corresponding characteristic vectors vg and vs; are 
computed offline. The exact EDM(S, S") is approximated by Г. (05, vs’) = 4 


6.2.3 Homomorphic Encryption 


We now briefly review the framework of homomorphic encryption. Let (pk, sk) be 
a key pair for a public-key encryption scheme, and let Ej, (x) be the encrypted value 
of a message x and Р, (С) be the decrypted value of a ciphertext C, respectively. 
We say that the encryption scheme is additively homomorphic if we have the fol- 
lowing properties: (1) There is an operation h,(-,-) for Ej4(x) and E,,(y) such 
that Ds, Cr, (E jc (x), Ep (y))) = x + y. (2) For any r, we can compute the scalar 
multiplication such that D,x(r - Epx(x)) = r x. 

An additive homomorphic encryption scheme that allows a sufficient number of 
these operations is called an additive HE.? Paillier’s encryption scheme [48] is the 
first secure additive HE. However, there are not many functions that can be evaluated 
by using only additive homomorphism and scalar multiplication. 

The multiplication D, (hy (Epx(x), Epx(y))) = x - y is another important homo- 
morphism. If we allow both additive and multiplicative homomorphism as well as 
scalar multiplication (called a fully homomorphic encryption, FHE [19] for short), 
it follows that we can perform any arithmetic operation on ciphertexts. For example, 
if we can use sufficiently large number of additive operations and a single multi- 
plicative operation over ciphertexts, we obtain the inner-product of two encrypted 
vectors. 

However, there is a trade-off between the available homomorphic operations and 
their computational cost. To avoid this difficulty, we focus on leveled HE (LHE) where 
the number of homomorphic multiplications is restricted beforehand. In particular, 


3 In general, the number of applicable operations over ciphertexts is bounded by the size of (pk, sk). 
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L2HE (Additive HE that allows a single homomorphic multiplication) has attracted a 
great deal of attention. The BGN encryption system is the first L2HE and was invented 
by Boneh et al. [6] by assuming a single multiplication and sufficient numbers of 
additions. Using BGN, we can securely evaluate formulas in disjunctive normal 
form. Following this pioneering study, many practical L2HE protocols have been 
proposed [2, 9, 16, 22]. 

In terms of EDM computation, although Nakagawa et al. [44] introduced an 
algorithm for computing the EDM based on L2HE, their algorithm is very slow for 
large strings. Following on from this work, Yoshimoto et al. proposed another novel 
secure computation of EDM for large strings based on the faster L2HE proposed 
by Attrapadung et al. [2]. To our knowledge, there is no secure two-party protocol 
for EDM computation that uses only the additive homomorphic property. Whether 
we can compute EDM using a two-party protocol based on additive HE alone is an 
interesting question. 

For the benefit of the reader, we give a simple review of the mechanism used by 
BGN, the first L2HE. For plaintexts mı, m2 € {1,..., M) and their corresponding 
ciphertexts C; and C2, the ciphertexts of m, + m» and mm» can be computed directly 
from C, and C2 without decrypting mı and m», provided m, + m», mım < М. 

For large primes qı and 42, the BGN encryption scheme is based on two multi- 
plicative cyclic groups G and С” of order 9192, two generators g; and g» of G, an 
inverse function (-)~! : С > С, and a bihomomorphism e : G x G > G’. By def- 
inition, e(-, x) and e(x, -) are group homomorphisms for all x € G. In addition, we 
assume that both the inverse function (-)~' and the bihomomorphism e can be com- 
puted in polynomial time in terms of the security parameter log, 4192. Such a system 
(G, G’, g1, 82, (-)~', e) can be generated by, for example, letting С be a subgroup 
of a supersingular elliptic curve and e be a Tate pairing [6]. The BGN encryption 
scheme proceeds as follows. 

Key generation: Randomly generate two sufficiently large primes 4 and q2, then 
use these to define (G, С”, g1, g2, (-)~!, e) as described above. Choose two random 
generators g and u of G, set h = u®, and let M be a positive integer bounded above 
by a polynomial function of the security parameter log, p; p». The public key is then 
pk = (pı p2, G, G', e, g, h, M) and the private key is sk = qi. 

Encryption: Encrypt the message m є (0,..., M} using pk andarandomr € Z, 
to C = gh’ є G yielding the ciphertext C. 

Decryption: Find the integer m such that C?! = (g"/")?! = (g?)" using a poly- 
nomial time algorithm. There is a known algorithm for this with time complexity of 
O(A M). 

Homomorphic properties: For the ciphertexts С = gh" and С = g"?^A^? in 
G corresponding to the messages m, and т», anyone can calculate the encrypted 
value of m; + т» and түт» directly from C, апа С» without knowing m, and т», 
as follows. 

— Additive homomorphism: 


Ca = С, Ch" = (g™ h” )(g"? n'?)h* - qim hn 
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gives the encrypted value of m, + m». 
- Multiplicative homomorphism: C,, = e(Ci, C2)h" € G' gives the encrypted 
value of mm», because 


Ch = е(С\, C2) 


ys 1т2 


elgi, 82)P™ n elg, 81) P elg, 8)" |" 


у"? 


= [е(81, &2 
= (e(g1, 82)“ 


, 


where we decrypt Cm, by computing m mz from (g(g1, g2)?')"'"? and e(g1, 22). 
Note that C1, C» є G’ also have additive homomorphic properties, so BGN allows 
a single multiplication and unlimited additions over ciphertexts. 


6.2.4 L2HE-Based Algorithm for Secure EDM 


We now explain the algorithm for computing approximate EDM based on L2HE [59]. 
Two parties A, В have strings S4, Sg, respectively. First, they compute the corre- 
sponding ESP trees T4 and Tg offline and they assign tentative labels to internal nodes 
of T4 and Tg using a hash function h : X — (1,2,...,n) for X C {0,1,...,m} 
of n different labels in T4 and Tg with a fixed m. The goal is to securely relabel X 
using a bijection: X — (1,2, ..., n}, as described in Algorithm 2. We suppose that 
A and B generate their own public and private keys prior to the computation. 

In Algorithm 2, we assume an L2HE scheme allowing a single multiplicative 
operation and a sufficient number of additive operations over encrypted integers. 
Because these operations are usually implemented by AND (-) and ХОК (Ф) logic 
gates (e.g., [7]), we introduce the following notation for these gates. First, E д(х) 
denotes the ciphertext generated by encrypting plaintext x with A’s public key, 
and Ед(х, у, 2) is an abbreviation for the vector (E 4(x), EA(Cy), EA(z). Here, 
Е д(х, у, z): EA(a, Б, c) denotes (EA(x - a), EACy - D), EA(z- c)) and Ey (x, у, 2) 
Ф EA(a, b, c) denotes (E д(х Фа), EA(y ® D), EA(z © c)) for each bit x, y, z, a, 
b,c € (0, 1}. Using this notation, we describe the proposed protocol in Algorithm 2. 

Next, we define the protocol security based on a model where both parties are 
assumed to be semi-honest, that is, corrupt parties merely cooperate to gather infor- 
mation out of the protocol, but do not deviate from the protocol specification. The 
security is defined as follows. 


Definition 6.1 (Semi-honest security [20]) A protocol is secure against semi-honest 
adversaries if each party's observation of the protocol can be simulated using only 
the input they hold and the output that they receive from the protocol. 


Intuitively, this definition tells us that a corrupt party is unable to learn any extra 
information that cannot be derived from the input and output explicitly (for details, 
see [20]). Under this assumption, since the algorithm is symmetric with respect 
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Algorithm 2 for consistently labeling T4 and Tg [59] 


Preprocessing (tentative labeling): Parties A and B agree to use a hash function H with a 
range {0,...,m} for sufficiently large т. Both parties compute T4 and Tg corresponding to 
their respective strings offline. The label of internal node u is assigned H (s(u)) where s(u) is the 
string of all leaves of u. Now, parties A and B have tentative label sets [T4], [Tg] € {0,..., m}, 
respectively. 


Goal: Change all the labels using a bijection: [T4] U [Tg] — {1,...,} without either party 
having to reveal anything about their private strings. 


Notation: E д (x) denotes the ciphertext of a message x encrypted by an L2HE with A’s public 
key. 


Sharing a dictionary: 

Step 1: Party A computes the bit vector X[1...m] such that X[£] = 1 iff £ € [7А]. Similarly, 
party B computes Y[1...m] such that Y[£] = 1 iff £ € [Tg]. 

Step 2: A sends E 4(X) to B and B sends Ез (Ү) to A. 

Step 3: B computes (EA(X) Ф EA(Y)) Ө (EA(X) - EA(Y)) = EA(XU Y) and A computes 
(Eg (X) © Ев(Ү)) Ө (Ep(X) - Ев(Ү)) = Eg(XU Y). 


Relabeling [7/4] using E А(Х U Y) ([Tg] is relabeled in a symmetrical fashion) 
t 
Step 4: A computes Eg(L;) = Ев (2x U vin) for all £ € [T4]. 
і=1 
Step 5: A sends all Eg (Le + re) to B choosing re uniformly at random from N. 
Step 6: B decrypts all Le + re and sends them back to A. 
Step 7: A recreates Ly € (1,..., п} for all £ € [T4] by subtracting re. 


to .A and 5, the following theorem proves the security of our algorithm's against 
semi-honest adversaries. 


Theorem 6.3 (Yoshimoto et al. [59]) Let [T4] be the set of labels appearing in T4. 
The only knowledge that a semi-honest A can gain by executing Algorithm 2 is the 
distribution of the labels {Lẹ | £ € [T4]} over [1,..., n]. 


Theorem 6.4 (Yoshimoto et al. [59]) Algorithm 2 assigns consistent labels using 
the injection: [ТА] U [Tg] — (1, 2,...,п) without revealing the parties’ private 
information. It has round and communication complexities of O (1) and O (a (n lg n + 
m + rn)), respectively, wheren = [ТА] U [Тв], m is the modulus of the rolling hash 
used for preprocessing, r = max(ri, ..., r4) is the security parameter, and a is the 
cost of executing a single encryption, decryption, or homomorphic operation. 
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Table 6.1 Comparison of the communication and round complexities of secure EDM computation 
models [44, 59] as well as a naive algorithm. Here, N is the total length of both parties' input strings, 
n is the number of characteristic substrings determining the approximate EDM, and m is the range 
of the rolling hash Н (-) for the substrings satisfying m > n. “Naive” is the baseline method that 
uses H (-) as the labeling function for the characteristic substrings 


Method Communication Round 
Naive O(mlgm) O(1) 
Nakagawa et al. [44] O(nlgn) О(1 №) 
Yoshimoto et al. [59] O(nlgn +m) O(1) 


6.2.5 Result and Open Question 


The complexities of related algorithms are summarized in Table6.1. Computing 
the approximate EDM involves two phases: the shared labeling of characteristic 
substrings (Phase 1) and the L;-distance computation of characteristic vectors (Phase 
2). 

Let the parties have strings x and y, respectively. In the offline case (i.e., there is no 
need for privacy-preserving communication), they construct the respective parsing 
trees Ту and T, by the bottom-up parsing called ESP [12], where the node labels 
must be consistent, meaning that two labels are equal if they correspond to the same 
substring. In such an ESP tree, a substring derived by an internal node is called a 
characteristic substring. In a privacy-preserving model, the two parties need to jointly 
compute these consistent labels without revealing whether a characteristic substring 
is common to both of them (Phase 1). After computing all the labels in Тү and 7,, 
they jointly compute ће L,-distance of two characteristic vectors containing the 
frequencies of all labels in Т, and Т, (Phase 2). 

As reported in [44], a bottleneck exists in Phase 1. The task is to design a bijection 
f: XUY —^ (L2,...,n) where X and Y (|X U Y| = n) are the sets of character- 
istic substrings for the parties, respectively. Since X and Y are computable without 
communication, the goal is to jointly compute f (ш) for any w € X without revealing 
whether w € Y. This problem is closely related to the private set operation (PSO) 
where parties possessing their private sets want to obtain the results for several set 
operations, such as intersection or union. Applying the Bloom filter [5] and HE tech- 
niques, various protocols for PSO have been proposed [4, 13, 35]. However, these 
protocols are not directly applicable to our problem because they require at least 
three parties for the security constraints. In contrast, the algorithm reviewed here 
introduced a novel secure two-party protocol for Phase 1. 

As shown in Table 6.1, the recent result eliminates the О (1g N) round complexity 
using the proposed method that can achieve O (1) round complexity while maintain- 
ing the efficiency of communication complexity. Furthermore, the practical perfor- 
mance of the algorithms for real DNA sequences was reported in [44]. 
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Chapter 7 A) 
Compression and Pattern Matching ENS 


Takuya Kida and Isamu Furuya 


Abstract We introduce our research on compressed pattern matching technology 
that combines data compression and pattern matching. To show the results of this 
work, we explain the collage system proposed by Kida et al. in 2003 that is a unifying 
framework for compressed pattern matching, and we explain the Repair-VF method 
proposed by Yoshida and Kida in 2013 and the MR-Repair method proposed by 
Furuya et al. in 2019 as grammar compressions suitable for compressed pattern 
matching. 


7.1 Introduction 


Data compression is a technology that reduces the space used to store data by com- 
pactly expressing the redundancy contained in the data. It is mainly used for efficiently 
storing large amounts of data and reducing communication costs. If we consider the 
conversion of information to digital data as a kind of data compression, it has a long 
history that can be traced back to the Morse code developed in the 1830s. Many 
compression methods have been proposed depending on the type and application of 
data [43—46]. 

Information retrieval has also long been studied as a technique for efficiently 
finding a target part from a large-scale dataset or data group [3, 11, 13, 14, 30, 
39, 55], and there are various methods depending on the required specifications. In 
particular, the approaches differ between searching for images and audio data and 
searching for documents (text). In this chapter, we focus on the latter task of text 
searching. 
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One of the basic problems in text searching is the pattern matching problem, 
which is also called the string matching problem. This is the problem of finding the 
occurrences of keywords (patterns) in a target text. Broadly speaking, there are two 
approaches to solving this problem. One is to build an index for the input text in 
advance, which is called text indexing. A text index allows for efficient searching 
when the target text is static and is not subsequently updated. The other is to access 
the input text sequentially from the beginning to the end while checking if the given 
pattern matches at the current reference position in the text. This is called text scan- 
ning. Text scanning is applicable even if the text is updated from time to time and 
it does not require index structures. In general, text indexing is superior in terms 
of search speed, while text scanning is superior in terms of search flexibility. By 
convention, “pattern matching” often refers to the latter, text scanning, in a narrow 
sense. 

In this chapter, we outline a fusion technology of data compression and pattern 
matching called compressed pattern matching. First, in Sect.7.2, we look back on 
the history of this field of study. Then, in Sect.7.3, we provide some notation and 
definitions that are used in the following sections. In addition, we recall grammar 
compression, which is the key compression scheme for compressed pattern match- 
ing. Next, in Sect.7.4, we introduce the general framework of compressed pattern 
matching proposed by Kida et al. [21]. In Sects. 7.5 and 7.6, we present outlines of 
Repair-VF and MR-Repair, respectively, which are the results of our work in this 
study. Finally, we conclude in Sect. 7.7. 


7.2 History of Compressed Pattern Matching Research 


The technology of combining data compression and pattern matching emerged in 
the early 1990s. This has come to be known as the compressed pattern matching 
problem [1], which is the problem of performing pattern matching without first 
decompressing the compressed input text. Formally, when a text T = tito... tu (ti 
is a symbol) is given in compressed form Z = z122 . . . Zn (zi is an element of the 
compressed text), and pattern P is given, the problem is to find all occurrences of P 
in T using only Z and P. A simple method is to first decompress Z to T and then use 
some commonly used pattern matching algorithm. However, this approach requires 
O(m + u) time for pattern matching in addition to the decompression time. 

The optimal algorithm for the compressed pattern matching problem is one that 
performs pattern matching in O(m 4- n) time in the worst case. However, it is not 
easy to achieve both efficient compression of text data and fast pattern matching on 
it. In the initial research in this field, individual pattern matching algorithms were 
developed for each compression method. For example, Eilam-Tzoreff and Vishkin et 
al. [15] proposed an algorithm for run-length compression, Gasieniec et al. [18] and 
Farach Thorup et al. [16] proposed algorithms for LZ77, and Amir et al. [2] proposed 
an algorithm for LZW [54]. However, these algorithms tend to be complicated, and 
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as Manber [29] pointed out, it is questionable as to whether they are more practical 
than the simple method. 

From the late 1990s to the early 2000s, several efficient methods for compressed 
pattern matching emerged [22, 38, 41]. For the first time, it was shown experimentally 
that these methods can perform pattern matching faster than the simple method. 
Furthermore, methods have appeared that can perform pattern matching faster by 
about the compression ratio than matching on the original text. The key is to select a 
compression method suitable for pattern matching even at the expense of compression 
ratio. In fact, Byte Pair Encoding (BPE), which was used by Shibata et al. [48] for 
this purpose, has a compression ratio of at most about 50% for natural language 
texts, while the LZ-family methods can compress the same texts to about 30% or 
less. However, text compression by BPE offers an advantage for pattern patching 
because all the codewords are fixed at 8 bits and the correspondence between each 
codeword and a portion of the text is relatively clear. 

This caused a paradigm shift. Whereas individual pattern matching algorithms 
were previously developed for each data compression method, we realized that in 
order to increase the matching speed it would be better to develop a new data com- 
pression method suitable for pattern matching. In fact, in the 2000s, several data 
compression methods were proposed for this purpose. 

One of the main groups of compression methods based on this idea are the 
compression methods proposed by Brisaboa et al. [7—9] and by Klein and Ben- 
Nissan [24]. These are based on a technique called dense coding [8]. Dense coding 
divides an input (natural language) text into words, and then encodes them so that 
the codewords become shorter in descending order of the frequency of the words. In 
addition, each codeword is assigned a bit pattern that has an explicit end to facilitate 
codeword extraction. Although dense coding offers good performance in terms of 
both compression ratio and pattern matching speed, some ingenuity is required to 
apply it to data such as DNA sequences that cannot be divided into words. 

The other system is grammar-based compression (or grammar compression) [23] 
with fixed-length coding. This idea is an extension of BPE and can be applied even 
if an input text cannot be separated into words. One direct improvement of BPE 
is a method using a context-sensitive grammar by Maruyama et al. [34], while for 
compression methods based on context-free grammar we have the methods by Klein 
and Shapira [25] and Uemura et al. [51]. In both methods, a modified version of 
suffix tree [53] is used as a dictionary tree for constructing grammar. 

In this chapter, for convenience, we refer to the former system as the dense coding 
system and the latter as the VF coding system. 

In the 2010s, new data compression algorithms began to appear that achieved 
compression performance comparable to well-known compression tools such as Gzip 
and Bzip while maintaining properties suitable for pattern matching. Among the two 
systems described above, the VF coding system has difficulties in terms of com- 
pression rate and compression speed. Therefore, research looked into searching for 
and improving grammar compression, which is the basis of the VF coding system. 
Among this work, the algorithm RePair [26], which was proposed before the name 
“grammar compression” was used, has attracted attention because of its excellent 
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compression ratio. Yoshida and Kida et al. [56] proposes a variant of RePair, called 
Repair-VF, which reduces the decrease in compression ratio by suppressing unnec- 
essary grammar rules while encoding the output using fixed length codewords. The 
time and space complexities required for Repair-VF are both O(n) for text of length 
n, which is the same as the original RePair. Repair-VF realizes high-speed pattern 
matching on compressed text while having a good compression ratio comparable to 
Gzip. 

Very recently, we proposed a novel variant of RePair, called MR-RePair [17], 
which constructs more compact grammars than RePair, particularly for highly repet- 
itive texts. This achievement comes from an analysis of RePair. We show in [17] that 
the main process of RePair, that is, the step by step substitution of the most frequent 
symbol pairs, works within the corresponding most frequent maximal repeats. We 
then reveal in [17] the relationship between maximal repeats and grammars con- 
structed by RePair. 


7.3 Preliminaries 
7.3.1 Definitions of Notation and Terms 


Let X be an alphabet, that is, an ordered finite set of symbols. An element T = fi .. . tn 
of X* is called a string or a text, where |T| = n denotes its length. Let = be an empty 
string of length 0, that is, |e| = 0. We denote a concatenation of two strings x, y є X 
by x - у, or xy for simplicity if no confusion occurs. 

If T = xyz with x, y, z € У*, then x, y, z are called a prefix, substring, and suffix 
of T, respectively. Let T[i : j] = t;---t; for any 1 <i € j x n denote a substring 
of T beginning at i and ending at j in T, and let T [i] = t; denote the i th symbol of 
T.Let w[i : j] = ¢ if j < i for simplicity. 


7.3.2 Grammar Compression 


A context-free grammar (CFG or simply grammar) G is defined as a four-tuple G — 
(V, X, S, R}, where V denotes an ordered finite set of variables, У; denotes an ordered 
finite alphabet, R denotes a finite set of binary relations called production rules (or 
rules) between V and (V U У)“, and 5 € V denotes a special variable called start 
variable. A production rule refers to the situation where a variable is substituted and 
written in the form v > w, with v € V and w є (VU X)*. Let X, Y € (VU X)*.If 
there are xj, x, xr, y € (V U &)* such that X = xixx,, Y = x;yx,,andx > y E R, 
we write X — Y, апа denote the reflexive transitive closure of > as =. Let val (v) be 
a string derived from v, that is, v > val(v). We define > grammar б = (V, È, $, R} 
as a subgrammar of G if V С су, È С (МОУ), and R CR. 
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Given a text Т, grammar compression is a method for lossless text data com- 
pression that constructs a restricted CFG uniquely deriving the text Т. For G to 
be deterministic, the production rule for each variable v € V must be unique. In 
what follows, we assume that every grammar is deterministic and each produc- 
tion rule is v; — expr;, where expr; is an expression either expr; = a (a € У) or 
expr; = vj vj ...Uj, (i > jy for all <А < ja). To estimate the effectiveness for 
compression, we use the size of the constructed grammar, which is defined as the 
total length of the right-hand side of all production rules of the grammar. 

While the problem of constructing the smallest such grammar for a given text is 
known to be NP-hard [10], several approximation algorithms have been proposed. 
One of them is RePair [26], which is an off-line grammar compression algorithm. 
Despite its simple scheme, RePair is known for its high compression in practice [12, 
19, 52], and hence, it has been comprehensively studied. Some examples of studies 
on the RePair algorithm include its extension to an online algorithm [35], practical 
working time/space improvements [6, 47], applications to various fields [12, 27, 49], 
and theoretical analysis of generated grammar sizes [10, 40, 42]. 


7.4 Framework for Compressed Pattern Matching 


A grammar compressed text can be expressed in a framework called collage sys- 
tems [21]. A pattern matching algorithm on the compressed text can then be obtained 
as an instance of the general algorithm on the collage system. Algorithm on collage 
systems can be understood as an extension of the Knuth-Morris-Pratt method (KMP 
method) [14]. 


7.4.1 KMP Method 


The KMP method a well-known linear-time algorithm for pattern matching on an 
ordinary (uncompressed) text. Its movement can be modeled as a linear automaton 
(KMP automaton) for a given pattern P. 

For a given pattern P, a KMP automaton consists of two functions: 


goto functiong : Q x X > QU [fail], 
failure function f : QV {0} > О, 


where О = (0, 1,...,|P|} is the set of states, and fail is a special symbol that 
is not included in О. For j є О and a є У, the goto function g returns j + 1 if 
P[j + 1] = a holds, otherwise it returns fail. For j = 0, let g(0, a) = Oforalla є X 
where P[1] 5 a holds. For j є Q \ {0}, the failure function f returns the maximum 
integer k such that P[1: k] = P[j —k+1: j] holds. 
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Fig. 7.1 KMP automaton for Р = abacb. In this figure, each circle indicates a state, and the 
double circle indicates the final state. Solid arrows and dashed arrows indicate the goto function 
and failure function, respectively 


text=: a b a c b b a b a a b а c b 
state transition: S234 0—1—2—3 1—=2—=3—=4—=5 
0 i 
0 


Fig. 7.2 Movement of КМР automaton. Solid arrows and dashed arrows indicate state transitions 
caused by the goto function and the failure function, respectively 


The automaton repeats state transitions by tracing g corresponding to the charac- 
ters read one by one from the input text. If g returns fail, then f is repeatedly called 
with the current state number to go back until a transition by g succeeds with the 
same character. When the automaton finally reaches the rightmost state, it can be 
judged that P has occurred. 

Figure 7.1 shows the KMP automaton for pattern P = abacb. The movement of 
the KMP automaton in Fig. 7.1 for the text Т = abacbbabaabacb is shown in 
Fig. 7.2. In this example, it can be judged that P has occurred when the automaton 
reaches the state number 5. 

To eliminate the failure function, we define the state transition function 6 : О x 
У — О as follows: 


‚у  [gG.a) ifg(j,a) + fail, 
6G, a) = eel a) otherwise 


Moreover, we extend it to О x X* as follows: 
ô (j, е) = j, 8*(j, ua) = 8(8*(], и), а), 


where j € Q,u € X*, anda € X. 


7 Compression and Pattern Matching 111 


7.4.2 Collage System 


A collage system is a pair (D, S) defined as follows: D is a sequence of assignments 


X, = expri; Хә = expr; -+> ; X, = expr,, where each X, is a token and expr, is 
any of the form: 
a foraeXU(s) (primitive assignment) 
XiX; fori, j < К, (concatenation) 


UlX; fori < k and an integer j, (prefix truncation) 
XV! fori < k and an integer j, (suffix truncation) 
(X;)/ fori < k and an integer j. (j times repetition) 


Let the set of all tokens in D be denoted by F (D). Each token represents a 
string obtained by evaluating the expression as it implies. Let the string represented 
by token X є F (D) be denoted by Х.и. For example, for X; = a; X2 = b; Хз = 
Ху X5; Ха = (Хз)?; Х5 = Жү, X4.u = ababab and X5.u = ababa. However, 
in this section we identify token X with the string it represents, and simply denote 
both by X unless confusion occurs. 

Let the number of assignments in D be the size of D, and denote it by ||D||, that 
is, ||D|| = |F (D)| = n. For a sequence S = X;,, X;,,..., Xi, of tokens defined in 
D, we denote by |S| the number k of tokens in S. 

The collage system (D, S) represents the string obtained by concatenating 
Xi, ..., Xi. That is, D and S correspond to the dictionary and compressed text 
in a compression method, respectively. Both D and S can be encoded in various 
ways. The compression ratios therefore depend on their encoding sizes rather than 
[||| and |S]. 

A collage system is said to be truncation-free if D contains no truncation oper- 
ation. A collage system is said to be regular if D contains neither truncation nor 
repetition operations. A regular collage system is said to be simple if for every 
assignment X = YZ, |Y.u| = 1 or |Z.u| = 1. 


7.4.3 Pattern Matching on Collage Systems 


The basic idea of pattern matching on a collage system is to simulate the movement 
of the KMP automaton on uncompressed text. Using the state transition function 
ô* of the KMP automaton defined in Sec. 7.4.1, we define the function Jump : Q х 
F(D) — Q as follows: 

Jump(j, X) = 6*(j, Х.и). 


The intent of Jump is to simulate the state transition of the original KMP automaton 
by jumping when it receives token X. Moreover, for any j є Q and X є F(D), we 
define the set Output(j, X) = Occp(P[1: j]- Х.и), where Occ p(x) indicates the 
set of all indices of occurrences of P within x. 
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Fig. 7.3 A matching 
algorithm on a collage 
system 


Input. Collage system (D, S) and pattern P = P[1 : m]. 
Output. All positions of occurrences of P in Т. 
/* Preprocessing */ 
Collect the information required to calculate Jump and Output; 
/* Scanning compressed text */ 
let S = Xj, Xin xong 
€ := 0; state := 0; 
for k := 1 to n do begin 
for each p € Output(state, X;,) do 
pattern P occurs at £ +p -m+ 1; 
state = Jump(state, Xi, ); 
€:=€+ ра | 
end 


in’ 


For a given collage system (D, S) representing text T and for a given pattern P, 
the pattern matching algorithm preprocesses the information required to calculate 
Jump and Output from D, and then performs matching while scanning a sequence 
of tokens in S one by one from the head (Fig. 7.3). 

From the results of [21], the following theorem is obtained. 


Theorem 1 (Theorem 3 of [21]) For a collage system (D, S), the compressed 
pattern matching problem сап be solved in O(||D|| + |S| + m? + R) time using 
ОПО + m?) space if (D, S) is regular, where R is the number of occurrences of 
pattern P in the text represented by (D, 5). 


This theorem applies to both RePair and Repair-VF because texts compressed by 
these can be described by regular collage systems. 


7.5 Repair-VF 


This section first introduces RePair and then gives an outline of Repair-VF. Repair- 
VF has a structure that combines RePair with a fixed-length coding. Please refer 
to the literature [56] for the details of Repair-VF and experimental results for its 
performance. 


7.5.1 RePair 


RePair is a grammar compression algorithm that was proposed by Larsson and Mof- 
fat [26]. For input text T, let С = (V, У, S, R} be the grammar constructed by 
RePair. The RePair procedure can then be described by the following steps: 


Step 1. Replace each symbol a є X with a new variable v; and add v; — a to R. 
Step 2. Find the most frequent pair p in Т. 
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[ve > a (a = a,b,r,c,d) 
V1 — VaVb 
V2  ViVvr 


S — vavcvavava 


Fig.7.4 Example ofthe grammar generation process of RePair for Т = abracadabra. The generated 
grammar is ((va, Up, Ur, Uc, 01, 02, 03, S}, {a,b,r,c,d}, 5, (va > a, Up > b, Ur > T, Ue > С, va > 
d, v; — VgUp, V2 — V1 Ve, V3 — 0904, 5 — vsvcvavqua]) with a size of 16 


Step 3. Replace every occurrence (or as many occurrences as possible if p is a 
pair consisting of the same symbol) of p with a new variable v, and then, add 
v — pto R. 

Step 4.  Re-evaluate the frequencies of pairs for the updated text generated in Step 
3. If the maximum frequency is 1, add 5 — (current textT) to R, and terminate. 
Otherwise, return to Step 2. 


Figure 7.4 illustrates an example of the grammar generation process of RePair. 
The following theorem relates to the performance of RePair shown by Larsson 
and Moffat [26]. 


Theorem 2 ([26]) RePair works in O(n) expected time and 5n + AK? + АК + 
[vn + 1] — 1 words of space, where n is the length of the source text, k denotes 
the cardinality of the source alphabet, and k' denotes the cardinality of the final 
dictionary. 


7.5.2 Outline of Repair-VF 


The original RePair encodes the rules in R excluding S using Elias gamma coding, 
that is, each codeword has a variable length, whereas Repair-VF uses a fixed-length 
code. The right side of 5 corresponds to the compressed text, and is converted to a 
sequence of fixed-length codewords of the rules in S. 

Consider the number of fixed-length coded rules. In the process of Step 1 of 
RePair, |5 | rules are created. In addition, the process of Step 3 of RePair replaces 
the most frequent pair and at the same time adds one rule to R. Let s be the number 
of rules which are added to R in Step 3. The total number of rules is then |X| + s, 
and thus each symbol can be fixed-length encoded with [log(|X| + 5)] bits. The 
information about X can be restored from the rules added in Step 3, so there is no 
need to explicitly save it. Therefore, only the rules added in Step 3 and the right side 
of S added last in Step 4 need to be saved. In the former, the right side of each rule 
consists of two symbols, so the total number of symbols to be saved is 25. Since the 
latter depends on the input text Т, let n be the length of T. The number of bits of the 
output compressed data is then 
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Qs + n)[log(]X| + 5)] (7.1) 


bits in total. 

We want to find the best s that minimizes the total number of output bits. Note that 
the final output tends to be smaller as s increases up to some point. In RePair, Step 3 is 
repeated until the frequency of the most frequent pair becomes 1. In the case of using 
a fixed-length code as above, this increases useless rules. Increasing the number of 
rules can increase the bit length per symbol, resulting in a longer final bit length. 
We can eliminate the waste by terminating the process in the middle. However, it is 
difficult to determine on the fly the s at which the output becomes minimal. Even 
after the first time the output size increases, the length of S may become shorter by 
continuing Step 3, and the output size may decrease again. 

Therefore, during the processing of RePair, we record the minimum value of the 
output size and the corresponding s by calculating Equation (7.1) every time a rule 
is added in Step 3. Note that the calculation of Equation (7.1) does not require actual 
coding or outputting since a fixed-length code is used. Finally, when S is output in 
Step 4, we can obtain the smallest output by outputting while expanding the rules 
added after the best s. 

This is Repair-VF (called Repair-VF-best in the original paper). The suffix “VP” 
comes from an abbreviation for variable-to-fixed length coding (VF coding). For the 
input text Т, each rule of the output grammar G corresponds to a substring of Т, and 
the right-hand side of S can be regarded as the variable length factorization of Т. 
Thus, Repair-VF can be viewed as a VF coding from the viewpoint of information 
source coding. 


7.6 MR-Repair 


In this section we outline MR-Repair, which is a method to reduce the output grammar 
size by focusing on the relationship between RePair and maximal repeats. Please refer 
to the literature [17] for the details of MR-Repair and experimental results for its 
performance. 


7.6.1 Maximal Repeats 


Let s be a substring of text T. If the frequency of s is greater than 1, s is called a 
repeat. A left (or right) extension of s is any substring of T in the form ws (or sw), 
where w € &*. We define s as a left (or right) maximal if left (or right) extensions 
of s occur a strictly lower number of times in T than s. Accordingly, s is a maximal 
repeat of T if s is both left and right maximal. In this paper, we only consider strings 
with a length of more than 1 as maximal repeats. For example, the substring abra 
of T — abracadabra is a maximal repeat, whereas br is not. 
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The following theorem describes an essential property of RePair, that is, RePair 
recursively replaces the most frequent maximal repeats. 


Theorem 3 (Theorem 1 of [17]) Let T be a given text, under the condition that 
every most frequent maximal repeat of T does not appear overlapping itself. Let f 
be the frequency of the most frequent pairs of T, and t be a text obtained after all 
pairs with frequency f in T are replaced by variables. There is then a text s such 
that s is obtained after all maximal repeats with frequency f in T are replaced by 
variables, and s and t are isomorphic to each other. 


7.6.2 MR Order 


According to Theorem 1 of [17], if there is just one most frequent maximal repeat 
in the current text, then RePair replaces all occurrences of it step by step. However, 
a problem arises if there are two or more most frequent maximal repeats, with some 
of them overlapping. In this case, the selection order of pairs (of course, they are 
most frequent) affects the priority of maximal repeats. We call this order of selecting 
(summarizing) maximal repeats the maximal repeat selection order (or simply MR- 
order). Note that the selection order of pairs actually depends on the implementation 
of RePair. If there are several distinct most frequent pairs with overlaps, RePair 
constructs grammars with different sizes according to the selection order of the 
pairs. 

However, the following theorem states that the MR-order rather than the replace- 
ment order of pairs determines the size of the grammar generated by RePair. 


Theorem 4 (Theorem 2 of [17]) The sizes of grammars generated by RePair are 
the same if they are generated in the same MR-order. 


7.6.3 Algorithm 


The main strategy of the proposed method is to recursively replace the most frequent 
maximal repeats instead of the most frequent pairs. 


Definition 1 (Definition 3 of [17]) For an input text T, let G = (V, X, S, R} be the 
grammar generated by MR-Repair. MR-Repair constructs G through the following 
steps: 


Step 1. Replace each symbol a € X with a new variable v a and add v, > a to 
R. 

Step 2. Find the most frequent maximal repeat r in Т. 

Step 3. Check if |r| > 2 and r[1] = r[|r|], and if so, use r[1 : |r| — 1] instead of 
r in Step 4. 
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Га ъа [еа [аа [Ы [Та 


[ve > a (а = a,b,r,c,d) 
[м 2 Vavovr 
V2 > ViVa 


S > V2VeVaVav2 


Fig. 7.5 Example of the grammar generation process of MR-Repair for Т = abracadabra. The gen- 
erated grammar is {{va, Up, Ur, Uc, Ud, V1, S}, la, Б, rc, d}, 5, {Ug —> a, ор > b, vy > r, Ve > 
C, Vg — d, 0] — VaUpUr, V2 — VI Vg, S — V2VeVgqgVgv2}} with a size of 15 


a | Vc | Va| Vd Va Veļ vr Va 


Ус | Va| Vd Vi Va 
v2 


Step 4. Replace every occurrence of r with a new variable v and then add v > r 
to R. 

Step 5. Re-evaluate the frequencies of maximal repeats for the updated text gen- 
erated in Step 4. If the maximum frequency is 1, add S — (current text) to R and 
terminate. Otherwise, return to Step 2. 


Figure 7.5 shows an example of the grammar generation process of MR-Repair. 
As shown in this figure, the size of the grammar generated by MR-Repair is smaller 
than that generated by RePair shown in Figure reffig:repair. 


Theorem 5 (Theorem 5 of [17]) Assume that RePair and MR-Repair work based 
on the same MR-order for a given text. Let g,, and gm, be the sizes of the grammars 
generated by RePair and MR-Repair, respectively. Then, 1 Erp < Emr < yy holds. 


77 Conclusion 


Inthis chapter, we outlined research on compressed pattern matching and showed that 
we can speed up pattern matching by selecting a suitable compression method. This 
has led to the development of compression methods that are useful for pattern match- 
ing. Whereas the initially developed compression methods had low compression per- 
formance, Repair-VF [56] achieves both a good compression rate and good matching 
speed by combining advanced grammar compression with fixed-length code. Collage 
systems [21] provide a unified algorithm for compressed pattern matching, allowing 
us to obtain an efficient pattern matching algorithm for grammar compression as an 
instance of the unified algorithm. 

Since proposing Repair-VF, we have proposed several improvements for it. LT- 
Repair [35] improves RePair processing semi-online by adding the constraint called 
the left-tall condition to its grammar. This makes it possible to efficiently compress 
large-scale text data with small memory. 

MR-Repair [17], which we have recently proposed, is a method that reduces the 
output grammar size by focusing on the relationship between RePair and maximal 
repeats. Although heuristic improvements [4, 20, 28, 36] focusing on non-maximal 
repetitive substrings have previously been proposed, MR-Repair is superior because 
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it has been proven to generate theoretically smaller grammar than the original RePair. 
A topic for future work is to see whether compressed pattern matching using these 
methods can be performed efficiently. 

In the present work, we mainly explained the compressed pattern matching prob- 
lem based on text scanning. However, the succinct index technology which combines 
text index and data compression was also established in 2000. This is an indexing 
technology that utilizes a succinct data structure that can solve query processing with 
a small space of almost the information-theoretic lower bound. Succinct index has 
an excellent property that allows full-text searching while compressing a target text 
smaller than the original text. For details of this technology, refer to the excellent 
book by Navarro [37]. 

In terms of online grammar compression methods, there exists FOLCA proposed 
by Maruyama et al. [33] and its improvement SOLCA proposed by Takabatake et 
al. [50]. FOLCA is based on a string factorization called edit-sensitive parsing. It 
performs factorization and grammar generation in parallel while reading an input text 
sequentially from the beginning. It is known that straight line programs (restricted 
CFGs) generated by FOLCA can be used as index structures [5]. 

In recent years, Martinez et al. [31] proposed a novel compression method called 
Marlin and an improvement of it [32]. These methods achieve both decompres- 
sion at ultrahigh-speed and good performance in terms of compression ratio. If we 
can decompress compressed data at sufficiently high speed, we can perform pattern 
matching efficiently even if it is performed after decompressing the data. Compara- 
tive studies on these approaches are also left for future work. 
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Chapter 8 A) 
Orthogonal Range Search Data aes 
Structures 


Kazuki Ishiyama and Kunihiko Sadakane 


Abstract We first review existing space-efficient data structures for the orthogonal 
range search problem. Then, we propose two improved data structures, the first of 
which has better query time complexity than the existing structures and the second 
of which has better space complexity that matches the information-theoretic lower 
bound. 


8.1 Introduction 


Consider a set P of n points in the d-dimensional space IR. Given an orthogonal 
range Q — Ux щ | х ie a Xx ES до | the problem of answer- 
ing queries for information on P N Q, the subset of P contained in the range Q, is 
called the orthogonal range search problem, and is one of the fundamental problems 
in computational geometry. 

The information obtained about P (1 Q differs depending on the query. The most 
basic queries are the reporting query, which enumerates all the points in P N Q, 
and the counting query, which returns the number of points | P N Q|. There are 
other queries such as the emptiness query, which checks whether P N О is empty or 
not, and aggregate queries, which compute the summation, average, or variance of 
weights of points in the query range. 

Applications of the orthogonal range search problem include database searches 
[21]. For example, assuming there is a database of employees of a company, then 
a query to count the number of employees whose duration of service is at least x, 
years and at most x» years, age is at least y; and at most y2, and annual income is 
at least z; and at most z2, can be formalized as an orthogonal range search problem. 
Other applications include geographical information systems, CAD, and computer 
graphics. 
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In such applications, it is common to perform multiple queries on the same point 
set P. We therefore consider constructing the problem as an indexing problem: Given 
a point set P a priori, we first construct some data structure D from P. Then, when 
a query range Q is given, we answer the query using the data structure D. 


8.1.1 Existing Work 


In many existing works, the number п of points is regarded as a variable for evaluating 
time complexity and the number d of dimensions is regarded as a constant. However, 
in this chapter, we regard d as a variable too. For the computation model, we use 
w-bit word RAM where ш = © (Ign) bits. That is, a constant number of coordinate 
values can be treated in constant time. Then, it takes O(d) time to check whether a 
point is inside a query range. 

If more space than © (dn) words is allowed to be used for the space complexity of 
data structures and if we assume that d is a constant, then we can perform the counting 
and reporting queries in time polynomial to log n. Range trees [2, 14, 15, 23] are 
such data structures. Range trees support counting queries in O(d log’! n) time 
and reporting queries in O(d Іов! п + ак) time using O(dn log’! n) word space, 
where k = |Р N QJ, that is, the number of points enumerated by a reporting query 
using the fractional cascading technique [15, 23]. Although these data structures are 
time-efficient, it is desirable to develop more space-efficient data structures. 

Some data structures having linear space complexity have been proposed. For 
example, quad trees [6] were the first data structures used for orthogonal range 
search. Unfortunately, quad trees have terrible worst-case behaviors. To overcome 


this, kd-tree [1] is used. The query time complexity of the kd-tree is о(а?п T) for 


counting and O(d?n TH dk) for reporting [13]. 

These data structures store the coordinates of points separately in plain form, 
and therefore can be applied to the case of real-valued coordinates. However, if the 
coordinates take integer values from 0 to n — 1, then there exist data structures with 
even smaller space complexity and query time complexity. For example, Chazelle [4] 
proposed a data structure for the two-dimensional case with linear space complexity 
and time complexity of O(lg n) for counting and O(lg n + k 10° n) for reporting where 
0 « є < lis any constant. Note that although the assumption that each coordinate 
value is an integer from 0 to n — 1 seems too strict, as is explained in Sect. 8.2.2, any 
orthogonal range search problem in d-dimensional space can be reduced into one on 
the [n] grid, and therefore the assumption does not create any difficulties. 

There has also been research on succinct data structures for the orthogonal range 
search problem. The wavelet tree [9] is a data structure which was originally pro- 
posed for representing compressed suffix arrays, and it later turned out that wavelet 
tree can support various queries efficiently [18]. For the orthogonal range search 
problem, wavelet tree can support counting queries in О(1е п) time and reporting 
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queries in O((1 + К)1е п) time [8]. Bose et al. [3] proposed improved succinct data 
structures that support counting queries in O(1g п / 1g 15 п) time and reporting queries 
in O(((1 + К) lgn/lglgn) for two-dimensional cases. 

For higher dimensions, Okajima and Maruyama [20] proposed the KDW-tree, 
which is a succinct data structure for any dimensionality. The query time complexity 
ofthe KDW-tree is smaller than that of the kd-tree. If we assume d is a constant, count- 


ing queries take O(n lg n) time and reporting queries take o((n^7 + k) lg n) 
time. The KDW-tree has been shown to be practical by numerical experiments. 


8.1.2 Our Results 


We show space and time complexities of data structures for the orthogonal range 
search problem explained in Sect. 8.1.1 and our proposed data structures in Table 8.1. 
Note that these are for the case where the coordinates are integers from 0 to n — 1, 
and the space complexities are measured in bits. Table 8.1 shows reporting time 
complexities. Counting time complexities can be obtained by letting k = 0. 

Our data structures are space-efficient for high-dimensional orthogonal range 
search problems. 

Our first data structure has the same space complexity as the KDW-tree and better 
query time complexities. Note that the result in Table 8.1 is for the case of d > 3. If 
d = 2, we can improve the n T term to lg n. This result appeared in [11]. 

Note that, as shown in Sect. 8.2.1, the necessary space to represent a set of n points 
in d-dimensional space such that each coordinate takes an integer value from 0 to 
n — lis (d — 1)nlgn + Ө (n) bits. This means that if we assume d is a constant, the 
space complexity of the KDW-tree and our first data structure does not match the 
information-theoretic lower bound asymptotically. 


Table 8.1 Comparison of complexities. The results or KDW-tree and Ours 1 are for d > 3. Note 
that k is the number of points enumerated by a reporting query. The time complexities for counting 
queries are obtained by letting k = 0 in the time complexities for reporting queries 


Data structure Dim. Space (bits) Query time 

kd-tree [1] d O(dnlgn) o( dnt + ак 

Wavelet tree [9] 2 nlgn--o(nlgn) | O(1+4) Ign) 

Bose et al. [3] 2 nlgn + o(nlgn) o(a +k) at) 

KDW-tree [20] d d{nlgn+ 0((poly(a) nT + dk) Ig n) 
o(nlgn)) 

Ours 1 d dinlgn + O( (n7. +dk) A) 
o(nlgn)) 

Ours 2 d (d —1){nlgn+ O(dnlgn) 
o(nlgn)) 
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Our second data structure uses (d — 1)nlgn + (d — 1) - o(nlgn) bits of space. 
This asymptotically matches the information-theoretic lower bound even if d is 
assumed to be a constant. Therefore, we can say this data structure is truly suc- 
cinct. Unfortunately, the worst-case query time complexity is O(dn lg n), which is 
not fast in theory. However, this data structure is fast in practice for the case where 
the number d of dimensions is large but the number d' of dimensions used for a query 
is small. This kind of query often occurs in the database search applications shown 
in Sect. 8.1. This result appeared in [10]. 


8.2 Preliminaries 


In this paper, we assume that coordinates of points are non-negative integers. As will 
be explained in Sect. 8.2.2, we sometimes assume that coordinates are integers from 


Oton — 1. Therefore, we define [n] as the set (0, 1, ..., n — 1}. Fora d-dimensional 
space, we denote each dimension by dim. 0, dim. 1,..., dim. d — 1, coordinate values 
of a point by O-th coordinate value, 1-th coordinate value, ..., d — 1-th coordinate 


value. For a rooted tree, we assume the depth of the root node is 0. Throughout the 

paper, log x denotes the natural logarithm and lg x denotes the base 2 logarithm. 
Next, we define two concepts used in this chapter. The first one is containment 

degree. This is the concept of an inclusion relationship between two orthogonal ranges 


introduced in [20]. For two d-dimensional orthogonal ranges Q — La 0 | х 


р [i2 4 and R= [и и | Xx [эў uj, |, we define CDeg(R, Q) 
as 


CDeg(R, Q) = # {i € [d] | p “| c Dx un 


and call it the containment degree of R with respect to Q. This is the number of 
dimensions, in each of which R is contained in Q. The containment degree is an 
important concept for analyzing time complexities of orthogonal range search algo- 
rithms. 

Next, we explain z-value. This is a projection of multi-dimensional data onto one- 
dimensional data as proposed by Morton [17]. Consider a point р = (po, р1,..., 
Pa-1) in the d-dimensional space where the coordinate values are integers. If 
coordinate values are expressed as /-bit binary numbers ро = bbl z p. ру = 
bb! tee TE eos Dd-1 = b? bl e S T the z-value z(p) of point p is defined 
as 


07,0 0 pipl 1 1-1, 1-1 1—1 
(р) = bobi <+: bg_y body s ba abe bi "Вар. 
In the case of a two-dimensional space, if we arrange grid points in increasing order 


of z-value, we see a z-shape curve as shown in Fig. 8.1. We therefore call the value 
z-value. 
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Fig. 8.1 Curve obtained by 0 1 2 3 4 5 6 7 
joining grid points in 
z-valueorder in 
two-dimensional space 
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8.2.1 Succinct Data Structures and Information-Theoretic 
Lower Bound 


Succinctness of data structures was proposed by Jacobson [12] and is one of the 
criteria for measuring space complexities of data structures. It is defined as follows. 

Let n be the number of different values that an object can take. Then, we need at 
least Пе n] bits of space to represent the object. If the space complexity S(n) of a 
data structure representing the object satisfies S(n) = lg + o(Ig n) bits, we say the 
data structure is succinct and [1g n] bits is the information-theoretic lower bound of 
the size of representations of the object. Note that succinct data structures not only 
offer data compression, but also support some efficient queries. For orthogonal range 
search, a naive algorithm supports linear time queries by scanning an array containing 
coordinate values of points. Succinct data structures are therefore expected to answer 
queries in sublinear time. 

The space complexity of Ign + o(lgn) bits in the definition of succinct data 
structures indicates that the size of auxiliary indexing data structures added to the 
data 15 negligibly small compared with the size of the data itself (Ig bits). In other 
words, the space complexity of succinct data structures asymptotically matches the 
information-theoretic lower bound when n — oo. 

We compute the information-theoretic lower bound for representing a set of points 
with integer coordinates. Assume that i -th coordinate value takes integer values from 
0 to U; — 1. Because the number of grid points is П“! U;, the number of different 


sets of n points is 
4—1 
[Tio Ui 
n 


By using Stirling's approximation formula 
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logn! = nlogn — n + O(logn), 
we obtain 


U 
log ( ) = log U! — log(U — n)! — logn! 
n 


=U logU — U — (U —n)log(U —n) + (U —n) — nlogn +n + O(log U) 


U U — 
— U log + nlog а + O(log U) 
U -n n 


stele- ang (1 7.) + O(log U 
= Ч log US nlog — (1— 7 (log U) 


пө ат ао 
Е С = п U-—n PEE U ов 


— nlogU —nlogn+ Ө (n). 


Therefore, the information-theoretic lower bound of the size for representing the 
point set is 


d-1 


а—1 
aU, 

(n e? nlgU; —nlgn + Ө (n). 
n 


i=0 


Note that storing coordinate values of the points explicitly using у, [1g U;] use 
n lgn bit more space than the information-theoretic lower bound. 


8.2.2 Assumptions on Point Sets 


Because data structures such as kd-tree or range trees that have linear or larger space 
complexities usually store the coordinates of points in a plain format, we do not 
care whether they are integers or real values. However, if we consider succinct data 
structures, we usually assume that coordinates values are integers from 0 to n — 1. We 
also assume that for any points p,q € P and any i € [d], the i-th coordinate value 
pi of p and the i-th coordinate value q; of q are different. Although this assumption 
may appear to be unrealistic and too strong, for the orthogonal range search problem, 
it is known that an arbitrary point set on R? can be transformed into a point set on 
[n] [7]. 

Consider a set P of n points on В“. We create another point set P’ оп [л] as 
follows. The set P' also contains n points and there is a one-to-one correspondence 
between points in P and points in P’. Assume that p є P corresponds to p! € P’. 
Then, the i-th coordinate value p; of p' is then defined from the i-th coordinate value 
pi of p as 
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p; = tq € P |qi < р}. (8.1) 


That is, the i-th coordinate value of p’ is the number of points in P such that the i-th 
coordinate value is smaller than p;. This is called the rank value of p with respect 
to the i-th coordinate value, and the transformation is called the transformation into 
rank space. We use arrays Co, C1, ..., Сат each of length n. The array C; stores 
the i-th coordinate values of points in P in increasing order. 

By using the point set P’ on the rank space and the arrays C; (i = 0,...,d — 1) 
that contain the original coordinate values of the points in P, we can reduce the 
problem of orthogonal range search on the original point set P into that on P’. 


Assume that a query range О = ke щ | Xx ES ui C R’ is given for 
a point set P. From the construction of P', there exists a range Q' — | ui” | x 


ex [502 «£i C [n]? such that 


peo = ped. 


The boundaries of this Q’ are computed by 
i? =#[peP| p<?! 


и@) =#|p eP | pi € | — 1. 
These аге computed in O(d 1g) time by binary searches on the arrays C;. Then, 
the counting query is performed by using Q’. For the reporting query, after finding a 
point p’ € P' which is included in the query range Q’ in the rank space, we need to 
recover the original coordinates of the point p € P. This is done in O(d) time using 
the arrays C; containing the coordinates of the original points by 


pi cL 


Thus, an orthogonal range search problem on Ҝ can be transformed into that on 
[n]7. Note that if coordinates are transformed as in Eq. (8.1), the identical coordinate 
values in Ҝ are transformed into identical coordinate values in [n]. By shifting val- 
ues by one for the identical coordinate values, we can transform the coordinate values 
so that for any two distinct points р’, q' € P’ and any i € [d], the i-th coordinate 
value p; of p' is different from the i-th coordinate value д; of q’. 

If the original points have integer coordinate values, we can reduce the space [19]. 
Consider the case where P is a point set on [U Ya, that is, each coordinate value takes 
an integer value from 0 to U — 1. In this case, the point set P' in the rank space 
does not change. However, we store the coordinates of the original point set P in 
a different way. We store them using multi-sets Mo, Mi, ..., Ма, each of which 
corresponds to one of the d dimensions. The multi-set M; stores the i-th coordinate 
value of the points in P. We use the data structure of [22] to store multi-sets. 
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Lemma 8.1 There exists a data structure using nlg(U / n) + O(n) which supports 
a selectm query on a multi-set M; in constant time. 


A selectm query on a multi-set M finds the j-th smallest element in M. That is, C;[] 
is obtained by finding the j-th smallest element in array C;. Therefore, if a query 
range О on [U ]" is given, it can be transformed into a query range Q’ on the rank 
space by binary searches using selectm queries, and the original coordinate values 
are obtained by d many selectm queries. 

Assume that there exists a succinct data structure D' for a point set P' on [и]. 
Then, the space complexity of D' is (d — 1)nlgn + (d — 1) - o(nlgn) bits, as shown 
in Sect. 8.2.1. If we add d data structures of Lemma 8.1, the total space complexity 
becomes dn lg U — nlgn + (d — 1) - o(n1gn) bits. This is succinct for the point set 
P on [U ]". Therefore, if there exists a succinct data structure for а point set on [л], 
we can construct а succinct data structure for а point set on [U ]¢. From here onward, 


we consider only point sets on [n]?. 


8.3 kd-Tree 


kd-tree [1] is a well-known data structure that partitions the space recursively. It 
is used not only for the orthogonal range search problem, but also for the nearest 
neighbor search problem. 


6.3.1 Construction of kd-Trees 


We explain the algorithm for constructing a kd-tree of a point set P for the two- 
dimensional case. First, we find the point p for which the x-coordinate is the median 
of the point set P, and store p at the root of the kd-tree. Next, we divide the set 
P \ {p} into two: the set Р.к that stores points with x-coordinates smaller than that 
of p, and the set P,ight that stores points with x coordinates larger than that of p. We 
add two children уен, Uright to the root of the kd-tree. Next, from Plett (Prignt), we find 
Diet (Pright) for which the y-coordinate is the median of the set, and we store реп 
(Pright) 10 vien (Vright). Similarly, we divide the set Pier \ { реп) (Prignt \ {Pright}) into 
two subsets according to y-coordinates, find medians with respect to x-coordinates, 
and store them in children of vies (Vrignt), and repeat this recursively. Figure 8.2 shows 
an example of partitioning a point set. 

For a d-dimensional space, we partition the space based on the first dimension, 
the second dimension, and so on. After partitioning the space based on the d-th 
dimension, we use the first dimension again. 
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z 


Fig. 8.2 Partitioning of a space based on the point set (left) and the corresponding kd-tree (right). 
Points A, B, C, ... correspond to nodes a, b, c, .... The range corresponding to node h is shown 
in gray in the left figure 


8.3.2 Range Search Algorithm 


An important concept for understanding range searches using a kd-tree is the cor- 
respondence between nodes of the kd-tree and ranges. In Sect. 8.3.1, we explained 
that each node of the kd-tree stores a point. We can also consider that each node 
corresponds to an orthogonal range. Let V (v) denote the point in P stored in node 
v and R(v) denote the corresponding range. Then R(v) is defined as follows: 


— For the root node r of the kd-tree, the range R(r) is the whole space. 

— For a node v at depth /, the range R(vjef) for the left child vies of v is obtained as 
follows. We partition R(v) into two by the hyperplane that is perpendicular to the 
(1 mod d)-th axis and contains V (v). Then, R (vies) is the range with the smaller (/ 
mod d)-th coordinate value and R(vyight) is the range with the larger (/ mod d)-th 
coordinate value. 


For example, in Fig. 8.2, the range R(h) corresponding to node h is the gray area. 

The algorithm for reporting queries using a kd-tree is as follows. The algorithm 
searches the space by traversing tree nodes from the root. Each time a node v is visited, 
the algorithm checks whether the corresponding point V (v) (€ P) is contained in 
the query range Q or not. If the range R(v) is fully contained in the query range 
Q, the algorithm outputs all the points stored in the sub-tree rooted at v. If R(v) 
and Q has no intersection, the algorithm terminates the search of the sub-tree. For 
a counting query, instead of outputting all the points when R(v) is contained in Q, 
the algorithm finds and accumulates the size of the sub-tree rooted at v. Although it 
may seem impossible to execute the algorithm since the range R(v) for node v is not 
explicitly stored in the kd-tree, if the range R(v) for node v is known, then we know 
the coordinate values of the hyperplane partitioning the range from the coordinate 
values of point V (v), and we can compute R (vett) and R (vrignt). Therefore, we can 
execute the algorithm by keeping the range R(v) during the search. 
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8.3.3 Complexity Analyses 


The time complexity of kd-trees is analyzed in [13]. A counting query takes 


o(a ‚пт + х) time. In general, we assume d is a constant and write ће сот- 


plexity as ofn). For a reporting query, we output all coordinates of points in 
Q. Because a point can be output in constant time, the query time complexity is 
Ofn? + k). 

If d > Ign, the height of the kd-tree is at most d, and therefore the space is 


partitioned at most d times. Then, it is necessary to traverse all the nodes and a query 
takes O(n) time. 


8.4 Wavelet Tree 


Wavelet tree is a succinct data structure supporting various queries on strings and 
integer sequences efficiently. It was originally proposed for representing compressed 
suffix arrays [9], but it later became known that wavelet tree can support more 
operations [18]. Orthogonal range search in two-dimensional space is one of these 
operations [16]. 


8.4.1 Construction 


The two-dimensional point sets P that can be represented directly using wavelet tree 
are those where the coordinates take integer values from 1 to n and the x-coordinate 
values are all distinct. As explained in Sect. 8.2.2, without loss of generality, we can 
transform any point set into a point set in [n]^ space. For such a two-dimensional 
point set P, consider an integer sequence C that contains the y-coordinates of the 
points in increasing order of x-coordinates. For example, for the point set in Fig. 8.3, 
the corresponding integer sequence C is 4, 2, 7, 5, 0, 3, 1, 6. For this sequence C, 
we construct a wavelet tree as follows. 

First, we consider that the root of the wavelet tree corresponds to C. Note that 
we do not store C directly in the wavelet tree. We then focus on the most significant 
(highest) bit of the [1g n]-bit binary representation of each integer in C. If it is 0 
(1), the integer is moved into the left (right) child of the root. We consider that each 
child node of the root corresponds to an integer sequence containing the numbers 
in the original array C in the same order. For example, in the example in Fig. 8.3, 
integers from 0 to 3 go to the left child, and integers from 4 to 7 go to the right child. 
Therefore, the left child corresponds to an integer sequence 2, 0, 3, 1, and the right 
child 4, 7, 5, 6. 
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Fig. 8.3 A two-dimensional point set Р (left) апа the corresponding wavelet tree (right) 
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Next, for each integer sequence of child nodes, we focus on the second most 
significant bit of the binary representation of each number. We move a number with 
О bit to the left, and a number with 1 bit to the right. Similarly, we repeat this until 
the integer sequence of a node consists of the identical integer. 

Note that we do not store integer sequences in nodes of the wavelet tree. In 
each node, we store a bit string of the same length as the corresponding integer 
sequence. The i-th bit of the bit string is O (1) if the i-th integer in the integer 
sequence goes to the left (right) child. In other words, a bit string stored in a node of 
depth / is the concatenation of the (/ 4- 1)-th highest bit of each integer in the integer 
sequence corresponding to the node. In the example in Fig. 8.3, the integer sequence 
corresponding to the root node is 4, 2, 7, 5, 0, 3, 1, 6, and because integers from 0 
to 3 go to the left child and integers from 4 to 7 go to the right child, the bit string 
stored in the root node is 1, 0, 1, 1, 0, 0, 0, 1. Note that we do not store bit strings 
at leaf nodes. We show the information stored in the wavelet tree in the right tree in 
Fig. 8.3. Only bit strings drawn above the dark gray rectangles, that is, those in the 
lower row of each node, are stored. 

Note that although it may seem impossible to recover the original information 
(the integer sequence) from these bit strings, it is possible. Consider the recovery of 
the fourth integer of the wavelet tree in Fig. 8.3 (right). From the bit string stored 
in the root node, we know that the first bit of the integer is 1. Because this 1 bit 
corresponding to the fourth integer is the third 1 in the bit string, we know that the 
integer to be recovered corresponds to the third bit of the bit string in the right child 
of the root node. If we look at the third bit of the right child, we know that the second 
bit of the integer is 0. Further, this 0 bit is the second O in the bit string, the integer 
to be recovered corresponds to the second bit of the left child of the current node. 
Finally, from the second bit of the left child, we know the last bit of the integer to be 
recovered is 1. Therefore, the fourth integer is 101 in binary, that is, 5. This is shown 
in Fig. 8.4. 
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Fig. 8.4 In the wavelet tree 


in Fig. 8.3, we recover the | 4 
fourth integer. By looking at 10 11110 0 O 1 
the bits enclosed in boxes, V m 


we know that the fourth 
integer is 101 in binary, that 


is, 5 TORNO 0 110/1 
ON 


In this recovery operation, we need to compute the number of zeros/ones in the 
first i bits of a bit string. This operation is also used in the range search algorithm 
in the next section. If we look at bits one by one from the beginning of a bit string, 
it takes O(n) time, which is too slow. We therefore represent the bit string of each 
node by the following data structure [5, 12]. 


Lemma 8.2 For a bit string of length n, there exists a data structure using n + 
o(n) bits which answers a rank/select query in constant time, where the rank query 
rank; (B, i) is to count the number of b bits (b = 0, 1) in the bits from B[0] to B[i] 
(i > 0) of a bit string B, and the select query select, (B, i) is to return the position 
of the i-th b (i > 1, b = 0, 1) ina bit string В. 


The select query is also necessary for range searches using a wavelet tree. 


8.4.2 Range Search Algorithm 


We explain how to solve the two-dimensional range search problem using a wavelet 
tree. First, we explain the counting query, which is performed by a recursive function 
as in Algorithm 1. For a query range О = [/, r] x [b, t], the argument of the function 
is WTCOUNTING(l, r, b, t, Voot, 0, 22"! — 1), where vroot is the root node of the 
wavelet tree. The left (right) child of node v is represented by vier (Uright). The bit 
string stored in node v is represented by v. B. 

We explain the algorithm in Fig. 8.5 using the example of searching a range 
Q — [1,6] x [1, 4] for the point set P in Fig. 8.3. 

The search algorithm traverses the tree from the root. During the search, the 
algorithm keeps the interval J of an integer sequence (or bit string) corresponding 
to an interval of the x-coordinate of the query range. In the example in Fig. 8.5, 
we focus on the interval J = [1, 6] at the root node. To move to the left child, we 
need to compute the interval corresponding to the query range. This is done by a 
rank query that counts the number of zeros from the beginning of the bit string to a 
specified position. In the bit string stored in the root node, the number of zeros from 
the beginning to the O-th position (in general, if the interval is J = [/, r], to (| — 1)-th 
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Fig.8.5 Behavior of the algorithm when searching arange of [1, 6] x [1, 4] forthe two-dimensional 
point set in Fig. 8.3 
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Algorithm 1 WTCOUNTING(x|, x2, Y1, Y2, v, a, b) 


Input: A node v of the wavelet tree and an interval [x;, х2] in the corresponding bit string, the 
interval [a, b] of y coordinate corresponding to node v, and the interval [y1, y2] of y coordinate 
for the query range. 

Output: The number of points stored in the sub-tree rooted at v and contained in Q. 

1: if x, > x» then 

return 0 

: else if [a,b] A у, y2] = Ø then 

return 0 

: else if [а,Ь] С [y1, y2] then 

return хо — xı + 1 

: end if 


: x < ranko (v. B, x; — 1) 


E E < ranko (v.B, хо) — 1 
10: x} < x1 — x 

11: x5 —x--1 

12: m — |(a + b)/2] 

13: return WTCOUNTING(x! , 3s V1, Y2, Лей, 4, M) 
-FWTCOUNTING(x| , X5, Y1, Y2, Uright, т + 1, b) 


position) is 0, so we know the interval corresponding to the query starts at position 
0. Because the number of zeros from the beginning to the 6-th position (in general, 
if the interval is Z = [/, r], to r-th position) is four, we know the interval ends at 
position 3. Thus, we obtain the interval / = [0, 3] for the left child. Similarly, for 
the right child, by using rank queries counting the number of ones, we can obtain the 
interval J = [1,2]. 

We repeat this process by going down the tree maintaining an interval. When we 
reach a leaf, we can determine if the y-coordinate of the point is included in the query 
range. However, we can sometimes determine this at an earlier stage. For example, 
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in Fig. 8.5, after obtaining the interval J = [1, 2] at the left child of the root, for the 
right child of the current node the interval of the y-coordinate corresponding to the 
node is [2, 3], which is completely included in the interval [1, 4] of the y-coordinate 
of the query range. Therefore, for the two points we focus on at this node, both 
the x- and y-coordinates are included in the query range, and we found two points 
in the query range. However, after computing the interval 7 = [1, 2] for the right 
child of the root, the interval of the y-coordinate corresponding to the right child of 
the current node is [6, 7], which has no intersection with the interval [1, 4] of the 
y-coordinate of the query range. We do not need to further search the sub-tree. 

As observed above, in a range search using a wavelet tree, if the query range is 
Q = [1, r] x [b, t], we first focus on points for which the x-coordinates are contained 
in Q, that is, contained in the range [/, r] x [0, n — 1]. Next, the process of traversing 
down the tree corresponds to partitioning the range into two according to the y- 
coordinate. If an obtained range is completely contained in the query range, or does 
not intersect with the query range, we terminate searching the sub-tree. 

For counting queries, it is sufficient to sum the number of points. For reporting 
queries, the extra work of computing the coordinates of the points is also required. 
This is shown in Algorithm 2. 

The outline of the reporting query is the same as the counting query. In 
Algorithm 1, we obtain the number of points in Line 2. We change it one by 
one to output coordinates of points corresponding to the interval [xi, x2] of the 
bit string v. B. The x- and y-coordinates of each point are obtained by WTREPORTX 


Algorithm 2 WTREPORTING(x;, хо, Y1, Y2, v, a, D) 


Input: A node v of the wavelet tree, the interval [x1, x2] of the bit string stored in it, the interval 
[a, b] of y coordinates corresponding to the range for v, and the interval [y1, y2] of y coordinates 
for the query range. 

Output: Coordinates of point stored in the sub-tree rooted at v and contained in Q. 

1: if x) > x2 then 

2: terminate 

3: else if [a,b] O [y1, y2] = Ø then 

4: terminate 

5: else if [а,Ь] С [y1, y2] then 

6 

7 

8 


for i = x4 to х2 do 
x «— WTREPORTX(v, i) 
y < WTREPORTY (v, i, a, b) 


39:09 


Output (x, y) 
10: end for 
11: end if 


12: х! < ranko (v.B, x; — 1) 
13: xl < ranko (v.B, хэ) — 1 
14: xf = x1 — х! 

15: x5 — 02 = х5 – 1 

16: т — |(a + b)/2] 

17: WTREPORTING(x}, x), Y1, Y2, пев, а, m) 

18: WTREPORTING(x| , х5, Y1, Y2, Uright, т + 1, b) 


8 Orthogonal Range Search Data Structures 135 


Algorithm 3 WTREPORTX(v, i) 


Input: A node v of the wavelet tree and an integer i. 

Output: The x coordinate value of the point corresponding to the i-th bit of the bit string stored in 
v. 

1: if v is the root then 

2: return i 

3: else if v is the left child of Vparent then 

4: i < selecto (Uparent-B, i+ 1) 

5: return WTREPORTX (Vparent, i) 

6: 

T: 

8: 

9: 


else 
i «— select, (орагепе: В, i+ 1) 
return WTREPORTX (Uparent, i) 
end if 


Algorithm 4 WTREPORTY (v, i, a, b) 


Input: A node v of the wavelet tree, the interval [a, b] of y coordinate corresponding to the range 
for v, and an integer i. 

Output: The y coordinate value of the point corresponding to the i-th bit of the bit string stored in 
v. 

1: if a = b then 

2: return a 

3: else if v.B[i] = 0 then 

4: i «ranko (v.B,i) — 1 

5: return WTREPORTY (уен, i, а, | (a + b)/2]) 

6 

T 

8 

9: 


: else 
i < rank, (v.B,i) — 1 
return WTREPORTY (унем, i, | (a + b)/2] + 1, b) 
end if 


and WTREPORTY, respectively. The algorithm WTREPORTY for computing the y- 
coordinate (Algorithm 4) is similar to the algorithm for recovering a value of the 
original integer array explained in Sect. 8.4.1. We compute the y-coordinates by 
traversing down the tree using rank queries. 

In contrast, the algorithm WTREPORTX for computing the x-coordinate 
(Algorithm 3) traverses up the tree using select queries. We explain this by example. 
In Fig. 8.5, assume that at node v, which is the right child of the left child of the root, 
we find that points corresponding to the interval J = [0, 1] are contained in the query 
range. Consider the computation of the x-coordinate of the point corresponding to 
the bit v. B[1]. First, the node v we focus on is the right child of its parent. We find 
the position of the second 1 in the parent by a select query. Then we know that the 
point corresponds to the bit v'. B[2] in the parent node v'. Next, because the current 
node is the left child of the parent (the root), we find the position of the third O in the 
bit string of the parent by a select query. Now we know that the point corresponds 
to the bit r. B[5] at the root node r. That is, the x-coordinate of the point is 5. 

As shown above, we can traverse the nodes of the wavelet tree using rank and 
select queries on bit strings. For range searches, we traverse down the tree from the 
root computing the intervals of the x-coordinate corresponding to the query range. 
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If we find a node where the corresponding interval of the y-coordinate is contained 
in the query range, we answer the query by computing the length of the interval or 
coordinate values by traversing the tree. 


6.4.3 Complexity Analyses 


We now analyze the space complexity of the wavelet tree and query time complexities 
for the orthogonal range search problem. 

First, we analyze the space complexity. The height of the wavelet tree is Пел]. 
The total length of bit strings stored in the nodes with the same depth is always n. 
Therefore, the total length of all the bit strings in the wavelet tree is n lg n. We can 
concatenate all the bit strings and store only a long bit string. Then it is not necessary 
to store the tree structure of the wavelet tree. By using the data structure of Lemma 8.2 
for this long bit string, the space complexity is n lg n + o(n 1g п) bits in total. 

Next, we consider query time complexities. For a counting query, we consider the 
number of visited nodes. In the wavelet tree, each time we traverse an edge toward 
a leaf, points with small y-coordinates go to the left child, and points with large y- 
coordinates go to the right child. At leaves we can consider that all points are sorted 
in increasing order of y-coordinates. This means that leaf nodes corresponding to 
the interval of y-coordinates of the query range exist in a consecutive place in the 
wavelet tree. Now, consider the set M of nodes of the wavelet tree defined as follows. 
The set M contains a maximal node v such that the y-coordinates corresponding to 
the leaf nodes in the sub-tree rooted at v are contained in the query range, that is, 
the y-coordinates of the leaves in the sub-tree of v are contained in the query range 
but the sub-tree of the parent of v contains some node for which the corresponding 
y-coordinate is not contained in the query range. This is the set of nodes from which 
we do not further search the sub-tree for a counting query using the wavelet tree, and 
in Fig. 8.6, it is shown as dark gray nodes. 

Let A be the set of nodes that are ancestors of nodes of M. This is the set of nodes 
visited before reaching nodes of M which are shown as light gray nodes in Fig. 8.6. 
The number of nodes visited in a counting query is then |A| + |М |. We now consider 
the size of M and A. 

For the size of the set M, the following lemma holds. 


Lemma 8.3 It holds |M| = O(lgn). 


Proof (Lemma 8.3) The set M is constructed as follows. Let M' be the set of leaf 
nodes of the wavelet tree corresponding to the interval of y-coordinates in the query 
range. For the nodes of M', if two nodes v, and v; have a common parent node v, 
we remove v; and v2 from M’ and add v to M’. By repeating this process until there 
are no such pairs of nodes, the set M' coincides with M. 

For each depth of the wavelet tree, the number of nodes of depth belonging to M 
is then at most two, because if there exist more than two nodes, two of them must 
have the same parent. This completes the proof that |M| = O(lgn). 
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The interval of у coordinates to search 


Fig. 8.6 Nodes visited by a counting query. We traverse light gray nodes, and when we reach a 
dark gray node, we do not further search the nodes below it 


For the size of the set A, the following lemma holds. 
Lemma 8.4 It holds |A| = O(lg n). 


Proof (Lemma 8.4) Consider a node v in the set A. In the set of leaf nodes in the sub- 
tree rooted at v, there must exist a leaf node where the corresponding y-coordinate is 
included in the query range and a leaf node where the corresponding y-coordinate is 
not included in the query range. Therefore, for each depth of the wavelet tree, there 
are at most two such nodes in A, because if there exists more than two such nodes, for 
a node in the middle, the corresponding y-coordinates of the leaves in the sub-tree 
rooted at that node are contained in the query range. This completes the proof that 
|A| = O(gn). 


From the above discussion, the number of nodes visited in a counting query is 
|A| + |M| = O(Ig n). When we visit a new node, we use a constant number of rank 
queries. Because a rank query takes constant time (Lemma 8.2), the time complexity 
of a counting query using the wavelet tree is O(lg n). 

For a reporting query, it is necessary to compute coordinates of points in the 
query range. As explained in Sect. 8.4.2, x-coordinates are computed by traversing 
up the tree and y-coordinates are computed by traversing down the tree, with the 
coordinates of each point computed by visiting О(1 n) nodes. Moving to an adjacent 
node in the wavelet tree is done by a constant number of rank/select queries, and 
each rank/select query takes constant time (Lemma 8.2). Therefore, the coordinates 
of a point are obtained in O(lg n) time, and the time complexity for a reporting query 
using the wavelet tree is O((1 + А) lg n), where k is the number of output points. 

We obtain the following theorem. 
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Theorem 8.1 The space complexity of the wavelet tree representing a two- 
dimensional point set on [n]? is nlgn + o(nlgn) bits, and a counting query takes 
O(lg n) time, and a reporting query takes O((k + 1) lg n) time, where k is the number 
of points to enumerate. 


As shown in Sect. 8.2.1, the information-theoretic lower bound for a point set on 
[n]? is n lgn + O(n) bits. Therefore, the wavelet tree is a succinct data structure. 


Theorem 8.2 Let P beasetofpoints on M = [1..n] x [1..n] inwhichall points have 
distinct x-coordinates. Then, there exists a data structure using nlgn + o(nlgn) 
bits that answers a counting query in O(lglgn) time and a reporting query in 
O((1 + А) Ig n/lglg n) time, where k is the number of points to output. 


8.5 Proposed Data Structure 1: Improved Query Time 
Complexity 


This data structure uses the idea of adding data structures to the kd-tree to improve 
the query time complexity [20]. First, we explain the idea of [20] in Sect. 8.5.1. 
Next, we explain the algorithm of range search in Sect. 8.5.3, and analyze the time 
complexity in Sect. 8.5.4. 


8.5.1 Idea for Improving the Time Complexity of the kd-Tree 


The method proposed in [20] improves the query time complexity of the kd-tree by 
adding d many wavelet trees to the kd-tree such that the term л'@—)/4 is replaced by 
n 42/4 (Поп if d = 2), at the cost of increasing the total complexity by a factor of 
O(lIg n). Note that we assume point sets are on [n]^. 

First, we construct the kd-tree for a given set P of points in the d-dimensional 
space. Next, we label the nodes of the kd-tree with numbers based on the inorder 


traversal of a binary tree defined as follows: 


— If the root node has a left child, we traverse the sub-tree rooted at the node. 
— Examine the root node. 
— If the root node has a right child, we traverse the sub-tree rooted at the node. 


Figure 8.7 shows an example of a point set (left) and numbers assigned based on the 
inorder traversal of the kd-tree of the set (right). 

Next, we make point sets P; (i =0,...,d — 1) with n points on [n]. The 
two-dimensional point set P; is created as follows. If a point p in the original d- 
dimensional point set P has the i-th coordinate value p; and the inorder position of 
the node of the kd-tree containing p is j, we add point (j, p;) to P;. Figure 8.8 shows 
the point sets Po, P, created from the point set in Fig. 8.7. 
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Fig. 8.7 A two-dimensional point set (left) and the corresponding kd-tree (right). The numbers of 
nodes are assigned by an inorder traversal of the kd-tree. The dashed lines in the left figure show 
the partition of the space by the kd-tree 
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Fig. 8.8 Two-dimensional point sets obtained from the point set in Fig. 8.7 


From these two-dimensional point sets Po, ..., Pa—1, we construct wavelet trees 
Wo, ..., Мат. The wavelet trees W; can be thought of as constructed from an integer 
sequence A; containing the i-th coordinate value of points in P in the order of the 
kd-tree. 

These data structures can be used for range searches as follows. Given a query 
range Q, we perform the original search using the kd-tree. In the original algorithm, 
as explained in Sect. 8.3, we traverse the kd-tree and shrink the range R (v), and when 
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CDeg(R(v), О) = d (i.e., R(v) € О), we know that all the points in the sub-tree 
rooted at v are contained in Q. By using the d wavelet tree, we can terminate the 
search when CDeg(R(v), Q) = d — 1. Assume that when a node v is visited, R(v) 
is contained in Q for all dimensions except for i. The inorder numbers of nodes in 
the sub-tree rooted at node v have consecutive values. Let [a, b] be the interval for 
the numbers. Then, the points in this interval are contained in Q except for dim. i. 
This implies that points in P; that are contained in the range [a, b] x [19 , ut ] are 
contained in О even for dim. i. Therefore, after finding the node v, it is sufficient to 
search the range [a, b] x Tia i] of P; using wavelet trees W;. 

The number of nodes of the kd-tree visited by this method is О(л@-—2/4\ (Ogn) 
for the case d — 2). The search of the last dimension using the wavelet tree takes 
O(lg n) time for a counting query. Therefore, the time complexity for a counting 
query using the kd-tree is improved to O(n'^/ 1g n) (O(Ig? n) for the case d = 2). 


8.5.2 Index Construction 


We now explain the proposed data structure. First, we construct the kd-tree for a given 
point set P. Note that this kd-tree is temporarily built in order to construct our data 
structure, and is not included in the final structure. Next, as in Sect. 8.5.1, we number 
the nodes of the kd-tree by an inorder traversal, and create d many two-dimensional 
point sets Po, ..., P4... For each P;, we create the data structure of [3]. Let B; be 
this data structure. Finally, we discard the kd-tree. The final data structure consists 
of Bo, 2333 Ва. 


8.5.3 Range Search Algorithm 


We explain the algorithm for a reporting query using the data structure explained 
in the previous section. The pseudocode is shown in Algorithm 5. This algorithm 
simulates a search of the kd-tree using Bo,..., B4... We explain it in comparison 
with the search algorithm of the kd-tree. Note that we explain the algorithm assuming 
the inorder number of each node v of the kd-tree is also assigned to the point V (v) 
stored in v. That is, if we say a point with number j, it is the point stored in the 
node with inorder position j. We also assume that for an interval [a, b] of point 
numbers, R([a, b]) denotes the range containing points that have numbers in [a, b]. In 
Algorithm 5, the interval [a, b] of point numbers always corresponds to the interval of 
inorder numbers of nodes in the sub-tree rooted at a node v of the kd-tree. Therefore, 
R([a, b]) coincides with R(v). 

If we use the kd-tree, we shrink the focused range R(v) by going down the 
tree. In the proposed method, by shrinking the interval [a, b] of point numbers, we 
reduce the corresponding range R([a, b]). Because the kd-tree stores the point V (v) 
corresponding to a node v, we can obtain the information of the point used for 
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Algorithm 5 REPORTING([a, b], О) 


Input: An interval of point numbers [a, b] and a query range Q. 

Output: Points with numbers in [a, b] and which are contained in range О. 

1: if Deg(R([a, b, Q) = d — 1 then 

2: For the last dimension i such that А([а, Б]) is not yet contained in Q, search [а,Ь] x 


[9 ТД of Р; using В; and enumerate points contained іп О. For each point, compute 


coordinates using Bo,..., Bg-1. 
3: else if R([a, b]) has no intersection with О then 
4: terminate 
5: else if a = b then 
6: Examine the point with number a. If it is in Q, output it. 
7: else 
8: c< [(a+b)/2] 
9: Output the point with number c if itis in Q. 
10: REPORTING([a, c — 1], О) 
11: REPORTING([c + 1, b], О) 
12: end if 


partitioning the space. In contrast, in the proposed method, points are not explicitly 
stored. However, if the focused interval [a, b] coincides with the interval of inorder 
numbers for the sub-tree rooted at a node v, we find c = [(a + D)/2] is the number of 
the points used for partitioning.! Furthermore, the intervals [a, c — 1] and [c + 1, b] 
correspond to the intervals of the numbers for sub-trees rooted at the left and right 
child of v, respectively. Therefore, by a recursive search of Algorithm 5, we can 
obtain the correct partitioning points. 

For the range R([a, b]), we can compute the ranges after a partition from the 
range before partition and the coordinates of the point used for partitioning similarly 
to the case of the kd-tree. 


8.5.4 Complexity Analyses 


We now analyze the complexities of the algorithm. First, we consider its space com- 
plexity. We use d data structures of Bose et al. [3] each of which uses п lg n + o(n lg n) 
bits as in Theorem 8.2. The total space complexity is then dn lg n + o(dn 1g n) bits. 

Next, we consider the query time complexity. If we use the same analysis as 
in [20], assuming d is a constant, we can show the number of nodes corresponding 


. А : d-2 к 
to cells with containment degree of at most а — 1 is O(n P). Here, we derive the 
query time complexity using a novel analysis for non-constant d. 


! Tn the kd-tree, at each depth, we partition the space by the median of the point set with respect 
to a dimension, and therefore c = [(a + b)/2] is the number of the point used for partitioning. If 
the point set contains an even number of points, we can obtain the correct partitioning point using 
a predetermined rule. 
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The proposed method partitions the space for each dimension in order, in the same 
fashion as for the kd-tree. As in [20], we define a series of partitions with respect to 
dim. 0 to dim. d-1 as a cycle. We then calculate the number T, (п, d) of nodes at which 
the containment degree with respect to О is at most d — 2 in the m-th cycle. When the 
(m — 1)-th cycle has finished, the space is partitioned into 24"— many cells. Among 
them, we count the number of cells for which the containment degree with respect to 
Q is at most d — 2. These cells contain a (d — 2)-dimensional face of Q (an edge of 
a cuboid if d — 3). A (d — 2)-dimensional face of a d-dimensional orthogonal range 
Q is obtained by choosing two dimensions from the d-dimensions and choosing the 
upper side or the lower side of the range for each of the two dimensions. Therefore, Q 
has (22 many (d — 2)-dimensional faces. When the (m — 1)-th cycle has finished, 
because each dimension is partitioned into 2" cells, the number of cells containing 
a (d — 2)-dimensional face is at most 2"— 00-2), Then after the (m — 1)-th cycle, 
the number of cells to be searched is at most 


d 22 И 2т—1)(4—2). 
2 


In the sub-trees rooted at these nodes, the number of nodes in the m-th cycle is 24 — 1. 
Therefore, it holds that 


d 
Tm(n,d) x (2^ — 1) B ees 
ed - ioe. 


Let N (n, d) be the number of nodes for which the containment degree with respect 
to Q is at most d — 2. It then holds that 


lign 

ate 
N(n,d) = È` T, (n. d) 

т=1 

llgn 
< 2444 – 1) 1 24?» 

m=1 
24—2 (25 е = 1) 
— 933 
= dd -1— 35— 


- o(d^ п). 


We use the fact that the containment degree is weakly increasing as we traverse down 
the tree. In the proposed method, we terminate the search when the containment 
degree reaches d — 1. The visited nodes are then those with containment degree of 
at most d — 2 and their child nodes. There are at most 2N (n, d) such nodes. The 
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Ign 
lglgn 
the coordinates of a point stored in a node. When a node for which the containment 


proposed method virtually traverses the kd-tree. It takes o(a ) time to compute 


degree with respect to Q is d — 1, we search the last one dimension in о(;= " ) time. 
glgn 
The time complexity of a counting query is then o(dn'? rine 


i For a reporting 


query, it takes o(a pan ) time to compute the coordinates of a point. The total time 
glgn 


complexity is then o((a 3n T + dk) ue), where k is the number of points in Q. 


In summary, we obtain the following: 


Theorem 8.3 For an orthogonal range search problem on the [n]? space, there 


exists a data structure that has space complexity of dnlgn + o(dnlgn) bits and 


E А А 3 „4—2 Ign 
which answers a counting query in o(a nd en 


) time and a reporting query in 


O((d?n Tp dk) iz j where k is the number of points in the query range. 


8.6 Proposed Data Structure 2: Succinct and Practically 
Fast 


The second proposed method is a data structure that is succinct and practically fast. 
In this method, we use d — 1 many wavelet trees to represent a point set on [n]. In 
Sect. 8.6.1, we explain how to construct the data structure. In Sect. 8.6.2, we explain 
the algorithm for the orthogonal range search problem. In Sect. 8.6.3, we analyze 
the space and time complexities. 


8.6.1 Index Construction 


In this method, we assume that the points of P have distinct values in the 0-th 
coordinate value. 

First, we create length-n integer arrays A1, ..., Ag_1. The array A; corresponds 
to dim. i, and stores the i-th coordinate value of the points in increasing order of the 
0-th coordinate value. Next, for those arrays we create wavelet trees Wi, ..., И. 
The wavelet trees W; can be considered to represent the two-dimensional point set 
P; generated from the d-dimensional point set P by projecting the points onto the 
plane spanned by the 0-th axis and the i-th axis. Figure 8.9 shows an example. 
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Fig. 8.9 A 
three-dimensional point set 
P and two-dimensional point 
sets P1, P» generated by 
projecting P onto each plane 


T2 
Tı 
To 
Р, 

Algorithm 6 REPORT(Q) 
Input: A query range О = Ee «| Xx [ agi 
Output: The coordinates of points of P contained in Q. 
1: D := Ø 
2: fori = 1 to d — 1 do 
3: if Earl C [0, n — 1] then 
4: D-—DU(i) 
5: cj := COUNT (n. [ш uP] x ік |) 
6: endif 
7: end for 
8: Sort elements i1, ..., ij pj of D in increasing order of сү. 


9: A := REPORTX (Ж К 0 x ce |) 
10: for i = i? to ір do 
11: foralla € A do 


12: if The i-th coordinate of a point for which the 0-th coordinate is a and is not contained in 
2), uf then 

13: A=A\ {a} 

14: end if 

15: end for 

16: end for 


17: for alla € A do 
18: Obtain the coordinates of a point for which the 0-th coordinate is a and output them. 
19: end for 


8.6.2 Range Search Algorithm 


Next, we explain how to solve the orthogonal range search problem using the data 
structure (Algorithm 6). 
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Assume that a query range О = [49 uf? | Xx ES 0 | is given. For 
each i = 1,...,d — 1 such that D п | + [0, n — 1], that is, ће dimension i 
used for the search, we count the number of points of P; that are contained in range 


Lu щ | х b п | using wavelet trees W; (counting query). Let m (= |р) 


be the number of i (= 1,...,d — 1) such that Ее ио | + [0, n — 1], and let 
i,,..., İm be the sorted ones in increasing order of the number of answers of counting 


queries. 
Using wavelet trees W;,, we then enumerate only the x coordinates of points of P;, 


contained in 2 ; и | х 1100) : uj | and store them in a set A. For each element 
a of A and for each i = i», ..., im, we check whether the i-th coordinate of a point 
for which the 0-th coordinate is a is contained in the query range. The elements 
remaining in A correspond to points in the query range. The answer to a counting 
query is the cardinality of A. For a reporting query, we compute coordinates of the 
points and output them. 

The reason we compute the number of points contained in each dimension by a 
counting query is twofold. Firstly, the x-coordinate (the 0-th coordinate) of points 
contained in the query range with respect to the i;-th (and the 0-th) dimension can 
be output quickly at line 9 of the algorithm if the number of points to enumerate is 
small. Secondly, in the double loops from line 10 to line 16, we want to reduce the 
size of A as soon as possible. 


8.6.3 Complexity Analyses 


Consider the space and time complexities of the proposed method. 

For the space complexity, we use d — 1 many wavelet trees. Therefore, the space 
complexity is (d — 1)lgn + (d — 1) - odgn) bits. 

For the query time complexity, let m be the number of wavelet trees used in a 
search. The time to perform m counting queries on wavelet tree is O(m lg n). We then 
sort m integers in O(m lg m) time. Next, we enumerate the x-coordinates of points 
contained in the query range for the dimension with the minimum number of points. 
Let cj, = Cmin be the number of points to enumerate. This takes O((1 + cuin) lg n) 
time. The time to check whether these points are contained in the query range for 
other dimensions is O((m — 1)cmin lg п). Let d' be the number of dimensions used 
in the query, then it holds that m < d’. Therefore, the query time complexity can be 
written as Ofd Gus lgn + d'lg d'). 
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8.7 Conclusion 


In this chapter, we first reviewed data structures for high-dimensional orthogonal 
range search. We then proposed two data structures for the problem. 

The first one simulates the search of the kd-tree using d succinct data structures 
for two-dimensional orthogonal range search data structures [3]. We improved the 
query time complexity of KDW-tree while keeping the same space complexity. 

The second one is succinct and practically fast. The space complexity is 
(d — 1)nlgn + (d — 1) - o(nlgn), which is succinct. The worst-case query time 
complexity is O(dn 1g n), which is not good. However, if the number d of dimen- 
sions is large but the number d' of dimensions used in a search is small, it runs fast 
in practice. 
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Chapter 9 A) 
Enhanced RAM Simulation in Succinct E 
Space 


Taku Onodera 


Abstract We describe two recent results on space-efficient functional random access 
memory (RAM), which is RAM with non-standard functionalities. The first is about 
oblivious RAM, which enables a remote database to be accessed without revealing to 
the database owner which part of the database is being accessed. The other is about 
wear leveling, which enables the number of updates to be balanced among all the 
memory cells regardless of the content of the computation being performed on the 
memory. 


9.1 Introduction 


Random access memory (RAM) underlies most modern computers, and improve- 
ments to the RAM itself can have a positive impact on a wide range of applications. 
For example, faster RAM access makes all RAM-based computations correspond- 
ingly faster. Some types of RAM improvements are not just about efficiency but 
also about functionality. An example is virtual memory in operating systems, which 
enables, among other things, application programs to utilize the memory without con- 
cern about cumbersome management issues such as allocation. Generally speaking, 
this type of RAM improvement functions by using conventional RAM to simulate 
“enhanced” RAM while introducing some performance overhead. 

In this chapter, we describe two such enhanced RAM simulations—oblivious 
RAM (ORAM) and wear leveling—with the emphasis on how to minimize the 
space overhead. These topics were chosen mainly because the authors’ knowledge of 
them, although there are also some conceptual similarities between ORAM and wear 
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leveling. Other functionality-enhanced RAM simulations include initializable array 
[2], memory checking [3], locally decodable code [11], and huge random object [8].! 


9.2 Oblivious RAM 


9.2.1 Problem 


Suppose you want to outsource a database, stored in RAM, to a server and want to 
access it in a privacy-preserving way. Although you can hide the data content by 
encryption, the server can still see which part of the RAM you are accessing. This is 
a serious issue in the current era of cloud computing. The same problem also appears 
when one wants to hide the details of software implemented in a physically secure 
processor that accesses insecure main memory. 

Oblivious RAM (ORAM) is the formalization and corresponding solution of this 
problem. Typically, it works by storing the RAM into some data structure on the server 
and moves the RAM cells dynamically in the data structure as the user accesses the 
RAM. 

As an example, consider a scheme where the server stores the RAM as-is except 
that each cell is encrypted by the user's key. To access the ith cell, the user performs 
the following procedure for j — 1 to N where N is the number of cells: 


1. Retrieve the ith cell from the server. 
2. Decrypt the retrieved cell. 
3. If i = j: 


e For read access, copy the decrypted value to local memory. 
e For write access, change the decrypted value to the new value. 


4. Re-encrypt the possibly changed decrypted value. 
5. Store the re-encrypted value back in the ith cell on the server. 


We assume semantically secure encryption when encryption is used in this chapter. In 
particular, there is an overwhelmingly high probability that the re-encrypted cipher- 
text looks totally different to the server from the ciphertext before re-encryption 
regardless of whether the plaintext is updated or not. Thus, no matter what actual 
access is performed, all that the server can see is that random-looking encrypted 
cells are updated to still random-looking re-encrypted cells in a fixed scan order. Of 
course, the access overhead of this method is very large since each cell access takes 


! A huge random object, in this context, is a succinct representation of a pseudorandom object 
that supports certain queries. For example, a pseudorandom function can be thought of as a huge 
pseudorandom bitstring that is implicitly represented by a tiny seed and supports efficient random 
access. This is not a data structure in the conventional sense because the represented object is a 
pseudorandom bitstring instead of "data." 
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time linear to the entire RAM size. The purpose of this example is merely to illustrate 
the kind of security we want to achieve. 

We now give a more formal problem description. We have three parties: the user, 
the server, and the simulator. The simulator models a program that runs in the local 
environment of the user. The simulator provides the user with an access interface 
to RAM that we call the virtual RAM while the server provides the simulator with 
an access interface to RAM that we call the physical RAM. That is, the user gives 
the simulator a series of queries of the form (type, i, v) where type € (read, write}, 
i € [N], and v € {0,1}. We call these virtual queries. The parameter N specifies 
the number of virtual cells—cells in the virtual RAM—while B specifies the size 
of each virtual cell. Given a virtual query, the simulator gives the server another 
series of queries of the form (type, i, v), where type є (read, write}, i € [N'], and 
v € (0, 1)7. We call these physical queries. The simulator, and thus the physical 
queries, is probabilistic in general? The server responds to physical queries in the 
obvious way. That is, for (read, i, ж) where “ж” means that the third component is 
arbitrary, the server returns the value of the ith physical cell, and for (write, i, v), the 
server updates the value of the ith physical cell to v. If the virtual query from the 
user is of the form (read, i, x), the simulator derives the value of the ith virtual cell 
through the interaction with the server and returns it to the user. If the virtual query 
is of the form (write, i, v), the simulator updates the value of the ith virtual cell to 
v. The simulator must respond to the virtual queries online. We call the sequence 
of second components of the virtual queries (resp. physical queries) a virtual access 
pattern (resp. physical access pattern). For a virtual query sequence q, let a(q) denote 
the physical access pattern induced by q. Recall that a(q) is a random variable in 
general. The ORAM scheme is secure if a(q1) and a (q2) are indistinguishable for any 
virtual query sequences 91 and 42 of the same length. There are some variations in the 
exact meaning "indistinguishable". Typically used meanings of indistinguishability 
in descending order of security are a) equally distributed, b) statistically closely dis- 
tributed, and c) computationally indistinguishable. The main performance metric of 
ORAM includes access overhead, which is the number of physical queries processed 
for each virtual query, the simulator local space size, and the server space size, which 
is B'N' bits. 

As mentioned above, the simulator models a program running in the local environ- 
ment of the user. Thus, in practice, we do not distinguish the user and the simulator. 
For example, we refer to the simulator local space as user space. 

The ORAM problem is non-trivial only if the user space is smaller than BN bits 
since otherwise, the simulator can store the entire RAM locally and ignore the server. 


? The simulator of the scan-based method is deterministic, and it is not hard to see that its linear 
access overhead is optimal if we restrict the simulator to be deterministic. Thus, all simulators of 
interest are indeed probabilistic. 
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Table 9.1 Summary of existing results. t means amortized bound. ~ log N means O( f (N)) for 
any f є e (log N). The constant factor of the user space of [26] is « 1 


Access overhead Sever space User space Technique Security 
(№) 

[7] O(/N log №) t N(1 4-2 N) O(1) Square root Computational 
[9] O (log? N yt O(N log N) O(1) Hierarchical Computational 
[19 O(/N log N) (1+ 6(1)N O(1) 
[19 O (log? N) O(N log N) O(1) 
[10 O(log? №)! а+ө ауу | OW) 
[13 LEN) а+ө уу |00 
[20] O(log N log log N)* (1-2 60)9N | od) 
[12 O(log N)* а+өп)у |0010) 
[25 O (log? N) O(N log N) O(1) Tree Statistical 
127 O(log? №) (1-G(D)N |а logN 
[17 O(log? №) (1+о0)№ |*logN 


9.2.2 Existing Results 


Table 9.1 gives a summary of some of the existing results. Every method has physical 
cell size B' = В + O(log N). There is ап Q(log N) lower bound for the access 
overhead if the user space is at most N '€ for constant € > 0 [14]. 

There are mainly two types of techniques that are actively studied: hierarchical 
approaches and tree-based approaches. Asymptotically, the state-of-the-art hierar- 
chical method [12] has access overhead matching the lower bound mentioned above 
while the state-of-the-art tree-based methods have about log N times larger asymp- 
totic access overhead. Yet, tree-based methods are still of practical interest because 
they tend to have much smaller access overhead constant factors than the hierarchi- 
cal methods. The access overheads of tree-based methods also constrain the worst 
case while those of the hierarchical methods are often amortized. Although there are 
techniques to achieve competitive worst-case access overhead via the hierarchical 
methods [13, 19], they tend to be complex and add further constant factors to the 
performance bounds. 

In the past, the ORAM research community has focused mainly on reducing the 
access overhead because it was the biggest obstacle to applying ORAM in practice. 
However, some recent studies have achieved practical access performances [16, 22] 
by combining tree-based ORAM with special hardware. For example, the PHAN- 
TOM secure processor system [16] supports access pattern-hiding SQL queries with 
a time overhead of 1.2-6 x compared to the standard insecure version. Thus, at least 
for tree-based ORAM, exploration of aspects other than access overhead is beginning 
to make sense. 


3 The square root method [7] was the first non-trivial ORAM and is the origin of some of the ideas 
underlying the hierarchical methods. 
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We describe the recent development of techniques for reducing the number of 
physical cells N’ of the tree-based ORAM to (1 + o(1)) N [17]. Note that there also 
is a space overhead originating from the cell size: typically, B’ = B + clg N where 
c is asmall constant such as 2. We ignore the cell size overhead and focus on the cell 
number because the typical value of B is 128 bytes in the secure processor setting 
and the overhead with respect to the cell size is just a few percent. 


9.2.3 Tree-Based Methods 


The tree-based method of Stefanov et al. [27] works as follows. The server organizes 
the physical cells into a complete binary tree with N leaves where each node is a 
bucket—a container that can accommodate a constant number of virtual cells. Each 
virtual cell has a position label—an integer in [№ ]—and a virtual cell with position 
label i is stored either in some bucket on the path from the root to the ith leaf or in а 
stash, which is a container in the user’s local memory that can accommodate a small 
number of virtual cells. Let v; be the ith virtual cell and let р; be the position label 
of v;. Suppose the user maintains p; for all i € [N] in local memory. This requires 
О (N) user space but simplifies the exposition. We will reduce the user space later. 
To access v;, the user retrieves all of the blocks on the path from the root to the p;th 
leaf. Let this path be P. At this point, v; must be in the stash. The user copies the 
value of v; to somewhere in its local memory for a read query or changes it to some 
other value for a write query. Then, the user updates p; to a fresh random value in 
[N]. After that, the user scans the buckets on P from the leaf to the root, and for each 
bucket, moves cells in the stash to the bucket greedily while respecting the position 
labels and the bucket capacity. See Fig. 9.1 for an example. 

Sometimes, some virtual cells in the stash cannot be moved back to the tree. For 
example, if all cells are assigned the same position label, only © (log N) physical 
cells can be used to store the virtual cells and thus, most virtual cells must end up 
in the stash. (Of course, if N is large, such an event happens only extremely rarely.) 
Stefanov et al. proved that if the bucket size is at least 5, the number of cells left 
in the stash after processing a query is exponentially small. Thus, if the stash size 
is @ (log N), the stash overflows during processing a polynomial number of queries 
with only negligible probability. 

To reduce the user space for position labels, the user outsources the position labels 
using the same method recursively. That is, each position label is a [lg N] bit integer 
and the table for position labels of all virtual cells can be thought of as a RAM 
storing the N [lg N ]-bit concatenation of the integers. Thus, the original problem of 
hiding the access pattern to RAM consisting of N cells, each of B bits, is reduced 
to hiding the access pattern to RAM consisting of N [1g А] / В cells, each of B bits. 
If, say, B > 21g N, which is a completely reasonable assumption for all reasonable 
N^ the problem size (cell number) decreases exponentially and reaches O (1) after 


^ Recall that the typical value of B is 128 bytes in secure processor applications. 
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Fig. 9.1 Example access (1) (2) 
process for reading the 4th stash stash 
virtual cell. V — 4. Bucket 41,3! 


size is 1. The expression i/ 
means the ith virtual cell 

with position label j. The 31 
path from the root to the first 
leaf is scanned from top to 

bottom in step (2) and from 4! E 25 13 24 
bottom to top in step (4) 


(3) (4) 


stash stash 


13 74 24 


O (е N) levels of recursion. At that point, the user can store the O(1) size RAM 
locally terminating the recursion. 

The tree at the top level of recursion has size O (N) and the tree at higher recursion 
levels decreases exponentially. The access overhead is proportional to the sum of the 
heights of the trees at all recursion levels, which is O (log? N). The server space is 
proportional to the sum of the sizes of the trees at all recursion levels, which is O(N). 

Although each recursion level requires a stash, the numbers of cells left in those 
stashes are independent and it turns out that the total number of cells left in all 
stashes is still exponentially small. Thus, f (N) user space is enough for any f (N) € 
€ (log N). 


9.24 Succinct Construction 


The constant factor hidden in the © (N) server space bound of the method described 
above is about 10: the top-level tree has 2N nodes each of capacity 5 while the 
size of the recursive trees is negligible because typically, B is much larger than 
lg N. (Theoretically, we assume В = о (15 N).) Though one can reduce this constant 
factor to some extent by decreasing the tree height while tuning the bucket size, it is 
not possible to achieve a factor < 2 while maintaining a meaningful stash overflow 
probability, at least using the currently known analysis techniques. This method also 
leads to prohibitively large access overhead as the server space becomes close to 2N. 
We now describe a method for achieving (1 + o(1)) № server space with a modest 
sacrifice in access overhead [17]. 
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Fig. 9.2 Large leaf layout 


lg! N+ 1613 N 


N Ле N 


The idea is to modify the layout of the tree at the top recursion level so that the leaf 
number is N/lg!^ N and the leaf size is 191 N + 1g!? N (see Fig. 9.2). It is obvious 
that the tree size is (1 + 1/ 1601 N)N while access overhead remains О (log? N).We 
now explain why the stash overflow probability remains small. 

Let N; be the number of cells with position label i for i € [N/1g! ^ N]. Let the 
load of a bucket be the number of cells stored in the bucket. At each moment in 
the lifetime of the scheme, N; follows the binomial distribution with parameters N 
and N/1g!^ N for each i. The probability that this becomes larger than Ig'* N + 
101° N is negligible. Thus, no leaf becomes full while processing a polynomial 
number of queries. Under this assumption, the distribution of the internal bucket loads 
is dominated by the distribution of the loads of the corresponding N/1g!^ N — 1 
internal buckets in the standard N leaf layout scheme described above. This is so 
because the internal buckets in the large leaf layout do not need to store the cells that 
overflow from the leaves. Thus, assuming no leaf becomes full, the stash overflow 
probability of the large leaf case is negligible. The same is true even without this 
assumption because there is only a negligible probability that the assumed case does 
not occur. 

We can reduce the №/ Ig?! N extra term on the tree size even further by “ће 
power of two choices". That is, we give two random position labels to each virtual 
cell. One is primary, which determines the path on which the cell can reside, while 
the other is secondary, which is a dummy needed to hide the access pattern. Now, 
Nj is the number of virtual cells with primary position label i. We maintain N; for 
all i in a sub-ORAM in the same way we store position labels in recursive ORAM. 
To access a virtual cell v, we retrieve all cells on the path from the root to the p;th 
leaf and the path from the root to the path leaf. We choose two random labels pj, р» 
and let p; (resp. p5) be the new primary (resp. secondary) label of v if Ny < №. 
Otherwise, we exchange the role of p and р». We then scan the paths specified by 
the old labels and greedily move back the cells as in the previous method. Here, N; is 
not binomial but concentrated much more tightly around the mean due to the effect 
of the two choices. Thus, the “head space" for each leaf can be much smaller than 
1g? N, leading to a smaller tree size. 
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By tuning the parameters, the first technique (large leaf layout) alone can achieve 
about (1 + @ (eX + —_))N server space while the second technique (two 


B Мор N 
choices) decreases it to (1 + © (P5 + NS 


9.2.5 Open Problem 


Itis unknown whether the optimal О (log №) access overhead and (1 + o(1)) № server 
space can be achieved at the same time. There are two natural approaches for answer- 
ing this question affirmatively: 


Develop a technique for making hierarchical methods, such as [9], succinct and 
apply it to the existing optimal method [12]. This seems particularly challenging 
if we further require a worst-case (instead of amortized) access overhead bound 
because the existing techniques for achieving a worst-case access overhead bound 
in the hierarchical approach [13, 19] require maintaining multiple versions of the 
database. 

Achieve O(log N) access overhead by a tree-based approach and apply the tech- 
niques described here. The first part is already an open problem of sufficient 
interest. 


9.3 Wear Leveling 


9.3.1 Problem 


Consider the case where you have RAM with the limitation that each cell state can be 
updated at most a certain number of times. Once the number of updates has reached 
the limit, the cell dies and you can no longer update it. The utility of the RAM quickly 
degrades as the cells start to die because the total amount of information that can be 
stored decreases, and it becomes cumbersome to manage which cells are still alive. 
Thus, the number of times you can support updates before cells start to die is of 
primary interest. This number depends heavily on the case. In the best case where 
the updates are uniform among the cells, you can perform n L updates where n is the 
number of cells and L is the number of times each cell can be updated. In contrast, in 
the worst case where all updates fall onto a particular cell, you can perform updates 
only L times. Wear leveling is the problem of prolonging the memory lifetime as 
much as possible while keeping the associated overhead, if any, as small as possible. 

The system community has been studying wear leveling for decades. Historically, 
flash memory was the main motivation for studies conducted from the late 1980s 
to the mid 2000s [1, 6, 15]. Today, the main motivation for wear leveling comes 


5 These bounds include the cell size overhead that we ignored in the main explanation for brevity. 
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from phase change memory (PCM), which is an emerging next-generation memory 
technology that has many features, including low latency, energy efficiency, and 
non-volatility [5]. Each PCM cell supports only 105—10? updates, which means that 
cells can start dying within minutes or even seconds if no effort is made to perform 
wear leveling. PCM differs from flash memory in certain important respects, such as 
latency, access granularity, and in-place write capability, and thus requires a different 
wear leveling formalization than flash memory. 

Most existing studies on wear leveling are conducted mainly from a practical point 
of view. Often, they do not have a formal problem statement or rigorous theoretical 
analyses. While this might not be a serious problem if the only thing that matters 
is the performance, some relatively recent studies have repeatedly emphasized the 
security aspects of wear leveling [21, 23, 24, 28, 29]. In particular, it is important to 
take into account the case of malicious users who actively try to reduce the memory 
lifetime. (Consider, for example, a computing outsourcing service.) 

Below, we describe a recent theoretical study that constructed a problem formal- 
ization to capture the wear leveling for PCM explained above, and the corresponding 
solutions [18]. 

The formal problem statement is as follows. There are two parties: the user and 
server. The server has three resources: physical RAM, wear-free memory, and private 
randomness. The physical RAM is RAM that consists of N B-bit cells while the wear- 
free memory is RAM that consists of a small number of B-bit cells. The user provides 
the server with adversarially chosen read/write queries to virtual RAM—a RAM 
consisting of n b-bit cells—and the server must respond to these queries "correctly." 
That is, each request is of the form (type, i, v) where type c (read, write}, i € [л], 
and v € (0, 1)^ and, for (read, i, ж) where “ж” means that the third component is 
arbitrary, the server must return the last value written to the ith virtual cell (the v 
in the last query of the form (write, i, v)). The server not only needs to return the 
correct responses but also needs to support as many write queries as possible with 
high probability without updating any physical cell more than L times where L is 
a parameter. We assume L = n? for some constant 5 > 0. Equivalently, we define 
ô :— log, n and assume it is a constant. This assumption is reasonable even though, 
in reality, L and n are independent, because L is 105-10? and log, n is at most 2 or 
3 for reasonable n. 

The performance metric for wear leveling includes the physical memory size, 
the wear-free memory size, the number of write queries supported, and the access 
overhead, which is the number of physical RAM accesses needed for each virtual 
RAM access. We say a wear-leveling scheme is “optimal” if it satisfies the following 
conditions (asymptotic notations are in terms of n — oo): 


e N=1+0(1)°; 
e With high probability, that is, 1 — O(1/n), it can process (1 — o(1)) NL write 
queries without updating any physical cell more than L times; 


6 We ignore the cell size overhead for brevity. Security Refresh [23] described below does not have 
any cell size overhead (B’ = B) while the method of Onodera and Shibuya [18] described after that 
has a cell size of B' = В + 2[lgn] +1. 
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Fig.9.3 Movement of cells in an epoch of security refresh. All the numbers are binary. n = 4(= N), 
го = "10", andr; =“11”. Solid arrows mean cell swaps while dotted arrows mean skipped cell swaps 


e The processing time of each query is O(1); 
e It requires only O(1) cells in the wear-free memory. 


9.3.2 Security Refresh 


The wear leveling scheme of Sewong et al. [23] is optimal if L = N° for ô > 1 while 
it is non-optimal (in fact, “far from" optimal) for ô < 1 [18]. 

In this method, п virtual cells are stored in the N = n physical cells in permuted 
order. The method works in epochs. At each epoch, two random lg n-bit integers ro 
and rı are maintained. (We assume n is a power of two for brevity.) At the start of 
an epoch, for each i € [n], the ith virtual cell v; is stored in the i ® roth physical 
cell Vig, where Ф means bit-wise XOR. During the epoch, each v; is moved from 
Vier to Vier. Note that the virtual cell stored in the destination Vier of v; 15 vios er, 
and its destination is Уф; that is, v; and Vigrar, swap their positions. This is 
done as follows. For every t write queries processed where t is a parameter, we 
perform a remap subroutine. At the ith remap subroutine call in an epoch, we check 
ifi <i Qro Ө гү. If so, v; still is in Vig,, and thus, we swap the contents of Vier, 
and Vier. Otherwise, v; is already in V;g,, and we skip swapping. The epoch ends 
after the nth remap subroutine finishes. At that point, each cell v; is stored in Vjg,,. 
We update ro to rı, and rı to a fresh random lg n-bit integer. Now every v; is in 
Vier) as required for the epoch start, and we restart another epoch at this point. See 
Fig. 9.3 for an example. To access v;, we access Vig,, if v; was already remapped in 
the epoch. (We have already seen how to check this.) Otherwise, we access Vier. 

The non-trivial part of the analysis is the proof of a high-probability guarantee 
of memory lifetime. We outline the key points. Fix a physical cell and let X; be the 
number of times it is updated during the ith epoch. We need to place a bound on 
the probability that the sum of X;s deviates from its expected value. To do this, it 
suffices to bound the deviation of the sum of odd-indexed variables X1, X3,... from 
its expected value and do the same for the sum of even-indexed variables Хә, Хд, ... 
separately. This is helpful because each X; is a random variable that depends on го, r1 
in the ith epoch (and the queries) and thus, the odd-indexed variables X1, X5, ... 
are independent of each other and so are the even-indexed variables. Regardless of 
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the queries, X; is bounded by the number of write queries processed in an epoch tn. 
Although this suggests the use of the Hoeffding inequality, it turns out that it does 
not work for the case ô < 2, essentially because the condition X; < tn alone does 
not capture the fact that some cell being updated many, say, ~ tn, times in an epoch 
negatively affects the number of times other cells are updated in the epoch. To derive 
the bound for the case 1 < à < 2, bound the second moment of X; and apply the 
Bernstein inequality [4]. 

If the user tries to keep on updating v; continuously, one of Vign and Vier is 
updated tn/2 = Q(n) times during the first epoch, and this physical cell dies if 
ô < 1. Thus, this method is not optimal for ô < 1. 


9.3.3 Construction for Small Write Limit Cases 


We now briefly describe a method for achieving optimality for the case ô < 1, that is, 
the memory is large [18]. The idea is to prepare spare cells and remap the frequently 
updated cells to free spare cells adaptively. (We maintain the write counts of cells 
by appending a counter to each cell.) We store pointers to the new locations in the 
old locations to trace the remapped cells. To keep the number of pointers to follow 
small, we connect pointers in a manner that is similar to the DFS of a complete 
d-ary tree with d^ leaves where d, h are parameters (see Fig. 9.4). As we continue 
to process write queries, the data structure gradually degrades: the free spare cells 
become scarce and the trees become saturated. To reset the degradation, we perform a 
Security Refresh-style mapping. That is, we treat the structure in Fig. 9.4 as residing 
in another RAM и and maintain a global mapping—a gradually changing one-to-one 
map between the cells of u and the physical cells Vi, V2, .... Once we have globally 
remapped a cell of u corresponding to a tree root, we reset the “DFS” starting from 
that cell. For example, if we globally remap и; in state (5) of Fig. 9.4, we free 
Un+1; Un+4, resetting DFS for the tree from u; to the root. Garbage such as и„+з are 
also reclaimed sooner or later when they are globally remapped. To access v;, the 
tree path traversal in u starting from i is simulated translating between addresses in 
u and addresses in V. 

Although analysis of the bound on memory lifetime is cumbersome, the same 
idea as the analysis of Security Refresh applies. Indeed, the core argument is easier 
because the Hoeffding bound suffices. 


9.3.4 Open Problem 


The access overhead of the method for the small write limit case described above 
is about 1/6. It is easy to obtain amortized 1 + o(1) and worst-case O (7) access 
overhead if we allow relatively large wear-free memory, for example, O(n‘) for 
an appropriate constant 0 < є < 1. It seems possible and practically relevant to 
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Fig. 9.4 Example evolution of u. d = h = 2. uj, ..., и, are default locations while the rest are 


spare cells. Each panel shows the state just after the thick-bordered cell was allocated because the 
cell previously storing its content was updated and the write count reached the threshold 


achieve amortized 1 + o(1) and worst-case O (1) access overhead in this setting. A 
theoretically more interesting challenge is to give negative results that justify the use 
of such large wear-free memory. 


9.4 Conclusion 


We reviewed two recent studies on ORAM and wear leveling that achieve succinct 
space usage. Though these objects have totally different motivations and are studied 
in different communities, there are some similarities between them. As we mentioned 
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in 
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the introduction, several other concepts with similar flavors are known, including 


initializable RAM, memory checking, locally decodable code, and huge random 
objects. There are probably many more such enhanced RAM instances yet to be 
found, and trying to find them can be an avenue for making progress in studies of 
data structures. 
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Part IV 
Sublinear Modelling 


Chapter 10 A) 
Review of Sublinear Modeling in get 
Probabilistic Graphical Models by 

Statistical Mechanical Informatics and 
Statistical Machine Learning Theory 


Kazuyuki Tanaka 


Abstract We review sublinear modeling in probabilistic graphical models by statis- 
tical mechanical informatics and statistical machine learning theory. Our statistical 
mechanical informatics schemes are based on advanced mean-field methods includ- 
ing loopy belief propagations. This chapter explores how phase transitions appear in 
loopy belief propagations for prior probabilistic graphical models. The frameworks 
are mainly explained for loopy belief propagations in the Ising model which is one 
of the elementary versions of probabilistic graphical models. We also expand the 
schemes to quantum statistical machine learning theory. Our framework can provide 
us with sublinear modeling based on the momentum space renormalization group 
methods. 


10.1 Introduction 


Statistical machine learning frameworks using probabilistic graphical models are 
useful for many applications, including information communication technologies 
[1—3], compressed sensing [4, 5] and neural information processing systems [6—10] 
in data-driven sciences. 

Most probabilistic graphical models belong to the exponential family [11] and can 
be regarded as classical spin systems in statistical mechanical informatics [12—17]. 
However, it is well known that many applicable formulations in data sciences as well 
as computational sciences can be reduced to combinatorial problems with some con- 
straint conditions which can be regarded as an Ising Model in statistical mechanical 
informatics [18, 19]. Moreover, much interest has focused on applying quantum 
annealing as a novel high-speed optimization technology to massive optimization 
problems [20-24]. 
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10.2 Statistical Machine Learning 


In statistical machine learning, most of the mathematical frameworks for machine 
learning are based on maximum likelihood frameworks [25, 26] from statistical math- 
ematical sciences. The important points are how to assume the prior distribution and 
the data generative probability distribution and how to express the joint probability 
between the parameters and the data vector. In this section, we explore maximum 
likelihood frameworks in terms of model selection and parameter selection from a 
given data vector. 


10.2.1 Bayesian Statistics and Maximization of Marginal 
Likelihood 


Let us consider a graph specified by nodes and edges, (V, E), where V is the set of all 
nodes i and E is the set of all edges (i, j}. State variables s; and d; are associated with 


$1 d 1 
52 dy 
each node i. The vectors s = А апаа = . correspond to the parameters 
Siv] div] 


and the data vector, respectively. The state spaces of s; and d; are given by Q and 
(—оо, +оо), respectively. Now o(d|s, 8) апа P(s|o) which correspond to the data 
generative and prior models, respectively, are assumed to be as follows: 


тр PES 
СБА [1 ew ; P: 22! (10.1) 


П ew(-5et -y) 


{i,jJEE 


YY-X De(--3) 


SEEN?  syjeQ(ij)eE 


P(s|o) = (10.2) 


The expressions for the posterior probability P(s|d, o, В), joint probability 
p(s, d|a, В), and marginal likelihood o(d|o, 8) are given by Bayes formulas as 
follows: 


p(s, |0, В) — p(d|s, B)P(s|o) 


E 10.3 
p(d|o, В) p(dla, В) en 


P(s|d, a, В) = 
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p(s, dla, B) = p(d|s, B)P(s|a), (10.4) 


p(d\a, B) = УУ: У pls, dla, В) 


s; es» e sivjea 


= УУ... dis 8)P(la). (10.5) 
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Estimates of the hyperparameters and the parameter vector @, 8, E 282,705 Яу) 
аге determined by 


a, B), (10.6) 
) GeV). (10.7) 


(@(d), B(d)) = argmaxp (d 


Si(d) = Р, (Si 
5; (4) argmax (s 


Equations (10.6) and (10.7) are referred to as the maximization of marginal likeli- 
hood (MML) [25, 26] and the maximization of posterior marginal (MPM) [27], 
respectively. 


10.2.2 Expectation-Maximization Algorithm 


The expectation-maximization (EM) algorithm is often used to maximize the marginal 
likelihood in Eq. (10.6) [25, 26]. The Q-function for the EM algorithm in the present 
framework is defined by 


Qo. Bla’, 8d) = 5 Р( 


SENSEN syyjEQ 


)in(o(s. 


). (10.8) 


The EM algorithm is a procedure that performs the following procedures of E- and 

M-step repeatedly for t = 0, 1, 2, --- until G(d) and B(d) converge: 

E-step: Compute Q(a, lad, t), B (d, t), d) for various values of o and f. 

M-step: Determine (a(d, t+1),B(d,t+ 1)) so as to satisfy the extremum con- 
ditions of Q(a ‚8 lo (d, t), Pd, t), d) with respect to o and В. Update 
a(d)<a(d,t + 1) and B(d)—p(d, t 4 1). 

The update rule from (o (d, t), B(d, t)) to (a(d, t + 1), B(d, t + 1)) forthe extremum 

conditions can be written as 
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In this way, the extremum conditions сап be reduced to 
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To realize the EM procedure as a practical algorithm, Markov chain Monte 
Carlo (MCMC) Methods are often used, which are powerful probabilistic methods 
[28, 29]. In some recent developments, advanced mean-field methods from statis- 
tical mechanical informatics are also used as powerful deterministic algorithms, as 
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shown in Sect. 10.3. Consider the expectation values for both sides of Eqs. (10.9) and 
(10.10) with respect to the state vector d of a data point according to the following 
probability density function where the hyperparameters o and f are set to their true 
values a* and 8*, respectively: 


p(dla*, B*) = у, p(d|t, a*, B*) P(t\a*), (10.16) 
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We can then derive simultaneous equations for the statistical trajectory (o (0*, В*, t), 
B(a*, B*, t)|t = 1,2,3,---}) in the convergence process ((o(d, t), B(d, t))|t = 
1, 2, 3,--- of the above EM algorithm. 

Equations (10.9) and (10.10) can be rewritten as follows: 
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By taking the expectation values of both sides of Eqs. (10.18) апа (10.19) with 
respect to the state vector of the data point d in the probability density function 
p (d|a*, 8*), the simultaneous deterministic equation for the statistical trajectory 
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{a(a*, B*, t), B(a*, B*, t)|t = 1, 2, 3, ---} of the EM procedure can be derived as 
follows: 
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In the case of a continuous state space © = (—оо, +00), the posterior and prior 
probabilistic models correspond to Gaussian graphical models, and the statistical 
trajectory in Eqs. (10.23) and (10.24) can be exactly computed by means of the 
multi-dimensional Gaussian integral formula [30]. 

For a discrete state space Q, it is generally hard to treat Eqs. (10.23) and (10.24) 
analytically. To estimate Eqs. (10.23) and (10.24), the following quantity is often 
introduced in statistical mechanical informatics [13, 17]: 
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The quantity in Eq. (10.25) can be rewritten as follows: 


+оо (+оо +оо 1 
|; ГА є обе", (im. 1 (Zid By — 1) ddd 


+оо г+оо 
= =f. i 4 p(dla*, B*)((Z(d, о, В)" — 1))adidd- --ddyvi 
zm 


1 +оо (+оо oo 5 
= im | f -f ПОР Yo У whao, ) ddıddz- аду — 1 


SERNER? — syjeQ 


1 +оо (+оо +00 
= lim (f f -f p(d|a*, B*) 
n—40n NJ-os J- M 


щі У У s У sese) t as) = 


j-1N5,;€Qs5,;€Q зу ;eQ 


IVI 
x opt 
а z) 
+оо р+оо +оо 
А 
летот 1l 2 


di D Xe Y)» wri sap sive, nj nne] =L (10.26) 


j-lNs,;eQso;eQ зу Є 


w(t1, 22, :- ту |4, a*, n) 


10 Review of Sublinear Modeling in Probabilistic Graphical Models ... 171 


Equation (10.26) means that computation of the statistical quantity in Eq. (10.25) 
can be reduced, up to some normalization constant, to computation of the statistical 
quantity in the probabilistic model given by the weight factor 


n 
d,a*, nl Iv. 52.js t SIV 


j=l 


w(T1, T2, + Ty d,a,B). (10.27) 


We remark that the weight factor (10.27) is expressed by considering some replicas 
of the posterior probabilistic model P(s|d, a, 8) and the analysis starting from the 
weight factor (10.27) is referred to as a replica method [13, 17]. One possible case 
for analytical treatment is the EM algorithm with the prior and posterior probabilistic 
models in Eqs. (10.2) and (10.3) for the compete graph (V, E). The dynamics of the 
EM algorithm with the MCMC method can be analyzed by using the replica method 
and the master equations for Glauber dynamics [31].! 


10.2.3 Expectation-Maximization Algorithm for Probabilistic 
Image Segmentations 


This section extends the previous section to the statistical machine learning frame- 
work for probabilistic image segmentation. In probabilistic image segmentations, 
we consider a square grid graph (V, E) in which a light intensity vector d; — 
(din, dia, dig) for the three components red dig, green dig and blue d;g is assigned 
to each node i. The state vector s for the labeled configuration and the data matrix 
D for the color image configuration are expressed as 


51 а dig dig dip 
$2 Ф dog dog d 

s=| 3 | D=| % |=| Фк Ga ds |. (10.28) 
S|v| divi divir divio ув 


Here p(D|s, a(+1), a(—1), C(+1), C(—1)) and P(s|œ) are assumed to be as fol- 
lows: 


p(D\s, a(+1), а(—1), C(+1), С(—1)) = [[(4: |5, a(s), С), (10.29) 
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! Glauber dynamics was proposed in Ref. [32]. 
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Note that the probabilistic graphical model in Eq. (10.30) is referred to as а Potts 
mode [33]. 

In probabilistic segmentation and clustering, p(D|s, a(+1), a(—1), C(+1), С(—1)) 
in Eq. (10.29) and P(s|qa) in Eq. (10.30) correspond to the data generative and prior 
models, respectively. The joint probability of s and D is expressed in terms of the 
data generative and prior distributions, o(D|s, a(+1), a(—1), C(+1), C(—1)) and 
P(s|a), as follows: 


p(s, Dia, a(+1), a(-1), C(+1), С(—1)) = (рв, a(+1), а(—1), C(+1), C(~1))P (slæ). 
(10.34) 


By using the joint probability distribution, the posterior probability 
P(s|D,oa,a(--1),a4(-1), CC-1), C(—1) and the marginal likelihood 
p(D\a, a(+1), a(—1), C (- D), C(—1)) are defined by using Bayes formulas as fol- 
lows: 
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Estimates of the hyperparameters and parameter vector, namely, a@(D), a(+1|D), 
a(—1|D), C (1D), Ce ЦР), $(D) = AD), S: 5(р), ·· ivi (D) are deter- 
mined by 
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The Q-function for the EM algorithm in the present framework is defined by 
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The EM algorithm is a procedure that performs the following E-step and M- 
step repeatedly for т = 0, 1, 2,--- until a(D), @(+1, D), @(—1, D), C(+1, D), 
@ (—1, Р) converge: 


E-step: Compute Q (a, а(+1), a(—1), С(+1), С(=1)|о(), а(+1, t), a(—1, t), 
C (-1, 1), C(—1, t)) for various values of a(+1), a(—1), C(4+1), and 
C(—1). 

M-step: Determine o(f + 1), a(+1,t+ 1), a(—1,t+ 1), C(+1,t+ 1), and 
C(-—1,t+ 1) that satisfy the extremum conditions of Q-function with 
respect to a(+1), a(—1), C(+1) and C(—1) as follows: 


(a(t + 1), a(+1,t + 1), ,a(—1,t + D, C(+1,t +1), C(-1,t4+ 1)) 


< extremum 
a,a(+1),a(—1),C(+1),C(-l) 
О(о, a(+1), a(—1), C(4-1), CDa), a(+1,t),a(—1,t), C(+1, t), C(—1,t), D). 
(10.40) 


Update (D) —a(t + 1), а(+1, D) <а(+1, 1+ 1), a(-1, D)<a(-1, 
t+ 1), Cel, D)<—C(+4+1, t +1) and С(— 1, D) —C(-1,t + 1). 


By using the equalities in Eqs. (10.29), (10.30), (10.34), and (10.35), the EM algo- 
rithm by the Q-function can be reduced to the following simultaneous update rules: 
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X ui P;(si|D, a(t), a(+1, t), a(-1, t), C41, 1), C1) 


Сеа (5:69), 
Ур: (si|D, a(t), а(+1, t), а(–1,0), CŒ, 0), C(-1, 0) 
ieV 
(10.42) 
C(s;,t + 1) 
VG — a(s;,t))" (d; — a(s;, t)) P; (s;|D, a(t), a(+1, t), a(-1, t), C(+1, t), C(-1, 0) 
_ ieV (s; €Q), 
УР, a(t), a(+1, 0), a(-1, 0, C, 0, C(-1,9) 
ieV 
(10.43) 
where 


P;G;|D, o, a(+1), а(—1), С(+1), C(-1)) 
= У` У... Уа, P(t т, цур, æ, a(+1), а(—1), С(+1), C(-1)) GeV), 


тє9тє9 тує 


(10.44) 


Р; (si, 51р, о, а(+1), а(—1), C+), С(—1)) 
= Ри (sj, s;|D, o, а(+1), a(-1), С(+1), C(-1)) 
= УУ... Уа, а, P(t т, цур, а, а(+1), а(—1), C+D), С(—1)) (fi, ЛЄР), 


TIEQIEQ ту 


(10.45) 


Р; (si, sj]|a) = Ру (у, sila) 
2) $3 e bin 8s. P(t о, пија) i, ЛЄР). 
тє9тє9 тує 


(10.46) 


10.3 Statistical Mechanical Informatics 


In statistical mechanical informatics [13—17], Ising models are very familiar prob- 
abilistic models for which computations are done by statistical mechanical tech- 
niques, including advanced mean-field methods, renormalization group methods, 
Monte Carlo simulations, and replica methods [36, 37]. This section reviews the 
framework of the Ising model and associated advanced mean-field methods.” 


? A review of both exact results and approximate results as well as perturbative computations for 
Ising models is given in Refs. [34, 35]. 
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10.3.1 Ising Model 


Let us consider an Ising model defined by the following probability distribution for 
the state space Q = {+1, —1} for the state variable s; at each node i (€V): 


J h J h 
P sld. орк =P 5132.7 ovi [dts das divi om рск 
КВТ КВТ КВТ КВТ 


1 1 1 
exp 7 L7 57 (а=) + 54 i y 
B (ЛЕЕ iev 


У; "SEE у, a у, mm) 


516905260 — syyjeQ {i jleE ieV 


(J > 0,, T > 0, dje(—oo, +оо) (VieV)). (10.47) 


Because s;? = 1 (i€V), the probability distribution P (s) can be reduced to 
P (s 


H(s) = Н (51, s2, +7, Sy] 


=-J X sis; - hy disi (J > 0, һ>0, die(-o0, +оо) (VieV)), (10.49) 
{i,jJEE ieV 


J h 1 1 
А 7 = Н Т > 0), 10.48 
Er т) zov ы e) > 0) (10.48) 


z= 9- У оо(- 1н). (10.50) 


5є95є9 SyyjEQ 


where H (s) and Z are referred to in statistical mechanical informatics as the energy 
function (or Hamiltonian) and the partition function, respectively, the probability 
distribution in Eq. (10.47) is called the Gibbs distribution, kg is the Boltzmann 
constant, T is the (absolute) temperature, J is the (ferromagnetic) interaction, 
and h is the external field. 

Let us suppose the Kullback-Leibler Divergence 


KL[P||R] = у у), ROn Te =) (10.51) 


SIENER sivjea Т’ kpT 


which is always non-negative for two probability distributions P(s = ET’ dr) and 
R(s) and is regarded as a pseudo-distance between them. By substituting the explicit 
expression for P (s) in Eqs. (10.48), (10.49) and (10.50) into Eq. (10.51), the expres- 
sion for the Kullback-Leibler divergence (10.51) in terms of the partition function Z 
and the free energy functional F[R] can be derived as follows: 
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KL[P||R] = kgTIn(Z) + FIR]), (10.52) 


( 
kgT 
where 


FIR] = > yo У H(s)R(s) + kaT У yo У R(s)In(R(s)). (10.53) 


NEQNE? — syjeQ sj€&s5eQ — sjyjeQ 


For the free energy functional F[R], it is valid that 


| J h 
argmin FIR) Yo У R( mos ny) =1 = p(s sr) 


TENEN? тує2@ 


(10.54) 


min FIR) Y R(t, т, ди) = 1b = ke Tin(Z). (10.55) 
Е тє90є9 TvjEQ 


Note that —kg7In(Z) is referred to as the free energy for the Gibbs distribution in 
Eq. (10.47). 


10.3.2 Advanced Mean-Field Method 


This section reviews the fundamental framework of advanced mean-field methods 
[12], including the mean-field approximation [35—37] and the Bethe approxima- 
tion [35, 39-41]. Our framework is given for the Ising model in Eqs. (10.48), (10.49), 
and (10.50). Itis known that a generalization of the present framework can be realized 
by using the cluster variation method in Refs. [42—45]. 

We introduce a trial probability distribution R(s) = R(s1, 52, ++, sjyj) which is 
restricted to the following functional form: 


R(s) = R(si. ss svi) = ] [8/G0. (10.56) 


ieV 


Ris) = УУ. У 8, R(t, mos туу) GEV). (10.57) 


TEQTIEQ TvjEQ 


By using the definition of R;(s;) and the normalization condition 
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УУ”. Фар у, R(t, T8 tivi) E 1, (10.58) 


TEQTIEQ TyjEQ 
we confirm that 


YR) = Ri(-D + R;(—1) = 1 (ieV). (10.59) 


nego 


By substituting the expression R(s) in terms of the marginal probability distribution 


Ri(s;) (ieV), such that R; = Mn "m into Eqs. (10.56)-(10.53), the 


free energy functional F[R] can be reduced to the following mean-field free energy 
functional: 


F(R] = Fuel{RilieV}], (10.60) 
where 


Ri(+1 { 
лаке = Fel { ( NP] 


=) p LJ | X teo) «(Y LJ 


{i ЛЄЕ Nea TjEQR ieV тє? 
+kgT ) У К(т)!(Кү(т})). (10.61) 
ieVTZ EQ 


Let us suppose the following conditional minimization of the free energy func- 


tional: 
=  [( RD _ 0 
й = |( 0 acs) 


= arg min | Fuel реу] 


icV, seo) 


XOR; (т) = 1, iev). (10.62) 


Ri 
la GEQ 


First we introduce the Lagrange multiplier А; ({¢V) to ensure the normalization 
conditions Уу RO = 1 (ieV) as follows: 
TEQ 


См = Хеу — у (M Rit) – 1). (10.63) 


ieV GEQ 


R; (i € V) are determined so as to satisfy the following extremum condition: 


178 K. Tanaka 


д . | 
сту erl Riev] s = 0 (i€V), (10.64) 
[s aci; ele vll] ени = 0 (ieV), (10.65) 
such that 
д | | 
См АУН] 2 =00єу). (10.66) 
OR; (R;-Ri|ieV) 


It needs to be shown that R; (i €V) are derived as follows: 
К, (51) 
= ехр 1+ ^ exp : hd; JM SO uR) Si (ieV, sje). 
kgT kgT 


jedi \т;є9 


(10.67) 


Finally, А; needs to be determined such that it satisfies the normalization condition 
of the marginal probability А; (5;). The marginal probabilities {Ri li € vi are derived 


as 
1 PN 
E (м + у} | У? ойе») 
денеЫ (ieV.s;eQ). (10.68) 
1 pa 
Yep Б һа; +I% Уу \туЁу(т)) Ti 
neg 7 jedi Vcje 


We introduce the local magnetization 


К, (5) = 


т; = У^. (0). (10.69) 


nego 


By solving the simultaneous equations 


DR) = RH + (0-10) = 1 GeV), (10.70) 
nego 

Уу ouR) = 0+1) – Ё(—1) = т GeV), (1071) 
nego 


with respect to R;(+1) and B cy we derive the following expression for the 
marginal probability: 
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2 1 | 
Ri(s;) = 5(1 + т) (i€V). (10.72) 
The extremum conditions in Eq. (10.68) can be reduced to the following simultaneous 
deterministic equation of (m;|i € V ): 
1 ; 
— tanh mE hd; + yum ‚| | Gev), (10.73) 
jedi 
which is referred to as the mean-field equation.? 


By substituting Eq. (10.72) into Eq. (10.61), the mean-field free energy functional 
can be reduced to 


Frur[{Ri(-1), R;C-D|ieV]] = Емк(т1, mo, +, ту), (10.74) 
Fur(m), m, +++, my) = —J у, т m; — hy йт 
{i, jJEE ieV 
1 
ту 1+ т; (50 + m)) 
tev 2 
1 1 
+) (1 - m)m(30 -— n). (10.75) 
The extremum conditions 
a 
5 Fyr(mi, m, +++, ту) = 0 (eV) (10.76) 
т; 


can be reduced to the mean-field equations in Eq. (10.73). 

We now explore the framework of the Bethe approximation for the Ising model in 
Eqs. (10.48), (10.49), and (10.50). Our framework is based on the cluster variation 
method [39, 42—45]. 

We introduce a trial probability distribution R(s) = R(si, 55, +++, Sıvı) that is 
restricted to the following functional form: 


3 Equation (10.73) is often referred to as the naive mean-field equation in statistical machine 
learning theory [2, 3, 12]. 
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Rij (si, sj) 
R(s) = R( 282,775, ) = К; (5) с со 
51,52 SIV (П P ) П К; (э) Rj (sj) 


ieV i,jjeE 
- (Tao) || 0650 |, 0077) 
ieV {i,jJEE 
where 
Rij(si.sj) = К (Sj, si) 


X yoo У буут ôs jr; R(t1, тд, +, туу) di ЛЄ). (10.78) 


UENDE? цүјє9 


By using Eqs. (10.57) and (10.78), we can derive the normalization and reducibil- 
ity conditions in the marginal probabilities as follows: 


XOR) = 1 GEV), YOY Ry t) = 1 (fi, j)eE), (10.79) 


GEQ GETER 


Ri(s) = У RuGi tj), К) = Y Ry s;) (i, j)eE). (10.80) 


тє тє 


By substituting the explicit expression for Р (s) and the expression In(R(s)) in 


terms of the marginal probability distributions R; — (^ ~ sca) (i€V) and 
Rij(+1, +1) 0 0 0 
DNE 0 RijC-1, —1) 0 0 Sd 
к= 0 0 Ri;(—1, +1) 0 ({i, j}€E) in Eq. 
0 0 0 Rij(-l-D 


(10.77) into Eq. (10.51), the Kullback-Leibler divergence can be reduced to the 
following expression in terms of the partition function Z and the Bethe free energy 
functional Fpemel{RilieV}, (Rjj|(i, jJ € E]]: 


1 
KL[P||R] = тг + Feeme[{RilieV}, (8:10, )6Е)]), (10.81) 


where 
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Zgene| [Ri li € V), (Rijl{i, ЛЄЕ)] 


=-J У, | Day ReaG, тр) sten) 


(íjjeE \тє9т;є9 ieV \тє9 


+kT d- н(е) 


ieV nego 


+keT у, | YI Ry, то)( (о, vj) |. (10.82) 


(2. ЛЄЕ \тє9т;є9 


Let us suppose ће following conditional minimization of ће Bethe free energy 
functional: 


(È jev]: Гале) 


= аг; min Fy Rilie Vy, {Ru pHi, JEL 
Е eer nee Bethe| { il nt ani "i 3] 


УА) = 1 GeV), 3 У Ry t) = 1 (i, NE), 


GEQ TEQT; EQ 

к) = Y Rij (1, cj) (Jedi, ieV), 
тує 

R(t) = Rij, ту) (еді, iev}. (10.83) 
2147 


We introduce the Lagrange multiplier A; (j€V), Ад, Ajjj(—1) = А, (— 1), 
Aij C- 1) = Aij; C- D) Ci, jJ € E) to ensure the normalization and reducibility con- 
ditions as follows: 


Leethe[(RilicV}, (А10, ЛЄ) = FBene[(RilieV). (Rij {i j)eE)] 


У YS Ril) – 1) 


ieV EQ 
m У^ mio» У Куб, ту) - 1) 
{i, j}E€E 7 €QT; EQ 
-> Yu D(Ri( 1) »5 Rij( 1, 7)) 
ieV jedi TjEQ 
-PO PODRED- У) ку, т). (10.84) 
ieVjeði TjEQ 


The marginal probabilities R; (i€V) and R; j Ci, Jj] € E) are determined so as to 
satisfy the following extremum condition: 


Сва Uii V), Uli, DEEN) =0 eV), 


[x (R;— RilieV), (i Ri j)e E) 


(10.85) 
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Leethe[{RilieV}, (8:10, jez], z _ = 0 (fi, jez). 


D R;-R;|ieV).(Ri;—R; jM EL} 
(10.86) 


It needs to be shown that R;(s;) (€ V) and R;;(s;, sj) = Rji(sj, si) Qi, ЛЕЕ) are 
derived as follows: 


PA hi 
Ri(si) = exp( 1 т 5) 


1 1 | 
xs joi — 1^ 45 acide] ею, (10.87) 


edi 


Rij (si. 5) = К. (sj, si) = exp| —1+ Mi 
s . ! J kgT 


1 
xew( mss Алуу i) jue) di, j) cE). (10.88) 


Finally, 4; and A,;, j} need to be determined so as to satisfy the normalization condition 
of the marginal probabilities R; (s;) and Е; (5.53). 
By introducing the messages Ju, (si) and ш> ;(s;) in the transformations 


Aij (Si) h 
ew(- 9 = П Шк (Si) оо ав), (10.89) 


кєді\{/} 


Aji (Sj " 
exp( - 2202.) =| [| eos) exo( 2. тач). (10.90) 


ledj\{i} 


The expressions of the marginal probabilities Ё ‚апа Ё ij in Eqs. (10.87) and (10.88) 
can be reduced to the following expressions: 


Ri(s;) = (Пи) Jo а) (i€V), (10.91) 


keði 
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Rij (si, 5j) = Rji(sj, si) = 


Zii j) 


П Шк Gi) П Iu (53) 
} 


kedi\{j} leaj\{i 


1 
exo ( (Isis + hd;s; + һа) (17, ЛЄЕ), 
kaT 
(10.92) 


Zi 


h 
(Пи) Aan) (iev), (10.93) 


тєО \Кєдї 


20 = УУ П Шк (Ti) П Mi j (Tj) 


TENTER \ keði {j} ledj\{i} 


1 

Еа + hdjt; + 'аут))) (i, j}€E). (10.94) 
B 

By substituting Eqs. (10.91) and (10.92) into the reducibility conditions in Eq. 

(10.80), the simultaneous deterministic equations for the messages can be derived as 

follows: 


Zi 1 А 
jails) = з) | ш) ew (re Usine) ({i, jJe E). 


Л ceo (eai d 
(10.95) 


» 1 - 
Mis j(Sj) = то П Hii) ew( irn Ы ма) е 
Л сео (kedi) i 


(10.96) 


The Bethe free energy functional is given by 


gene ilie V], (Ruhi, ЛЕЕ] = -ks TY (1 – [dil)nZ; — т у InZy 5. 
ieV {i, jJEE 


(10.97) 
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The framework in the Bethe approximation using Eqs. (10.91), (10.92), (10.93), 
and (10.94) with Eqs. (10.95) and (10.96) is referred to as a loopy belief propa- 
gation in statistical machine learning theory [12, 46—48]. The present derivation 
is based on the cluster variation method in Refs. [39, 42-44], and [45]. Recently, 
some novel approaches for loopy belief propagation methods have been proposed, 
including the approximate message passing algorithm [49], and replica cluster 
variation method [50, 51]. A review summarizing recent developments in loopy 


belief propagation methods is given in Ref. [52]. 
By solving 


Y У Ry. tj) = Ril, +) + Rij +D + Rl, D + Ёу(—1,—1) = 1, 
TENT; EQ 


(10.98) 


У У ufi. s) = Ry G1, +) — RC +D + Ry, -D — Aj 71, 7D =m, 
TENT; EQ 


(10.99) 


у У) Ryu. тур) = RjG +D Ry (-1, +) — Ry, 70 — Ёу(—1,—1) = mj, 
TENT; EQ 


(10.100) 


у `Уттуй (т, tj) = Ry(+1, +1) — R71, +1) 


тєтє 
—Ё (+1, —1) + Ё(—1,—1) = сур = сул}, 
(10.101) 


аз simultaneous linear equations for [3 (+1, +1), Rij (—1, +1), Rij (+1, — 1), and 
Rij(—1, —1), we can confirm the following equality: 


zx 1 
Rij (Si, 8j) = 201 + Misi + MjSj + C(i,j)5i5;). (10.102) 


By substituting Eqs. (10.72) and (10.102) into Eq. (10.82), the Bethe free energy 
functional can be reduced to 


gone | [Ri ie V]. (&;|ti. DEE} ] = Fretne({milieV}, (cu. |t. JEE}, (10.103) 
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Feetne({milieV}, feu |i. jeE]) =-J > Cti, j} — Ay dimi 


(íjleE ieV 


+kgT (1—|д]) Y RiGoln(Ris) + kT у) У Rig (si, sj)In(Rij (si, 5у)). 


ieV 5=+1 (i, jJEEsj;=+1 


(10.104) 


The extremum conditions 


д 


Bm, Pe em]ie v]. fcu nhi, ЛЄЕ}) = 0 ev), (10.105) 
д 
ҮТП Fretne({milieV}, feu plti, j)e E]) = 0 (к, Пе) (10.106) 


can be reduced to the following simultaneous equations: 


h 1 І x 1 P А 
т“ = 50 = 191) Y ^ tiln(Rj(z:)) + ‚> у утщ(Ёу(т,ту)) (eV), (10.107) 


тє? Jedi T EQT; EQ 


ey = TE E (6,5) ({i, j)e E). (10.108) 
t;eQrjeQ 
The schemes for the derivations of Eqs. (10.107) and (10.108) from the Bethe free 
energy (Eqs. (10.103)-(10.104)) are given in Refs. [41, 53—55]. 

In the advanced mean-field method, some researchers are interested in perturba- 
tive computation of the correction terms with respect to ET from the mean-field 
free energy [56, 57], which is referred to as a Thouless-Anderson-Palmar (TAP) 
free energy. The scheme used in the derivations has been extended to a classical 
Heisenberg model [58]. One familiar perturbative method in statistical mechanical 
informatics is the Plefka expansion, in which we obtain higher-order correction terms 
with respect to Er from the mean-field free energy [12]. By substituting Eq. (10.102) 


into Eq. (10.108), cy, j} can be expressed in terms of m;, mj, and ET: It is known 
that the TAP equation can be derived by expanding the expression for сү; j} up to the 


Т 

into Eq. (10.82) with Eqs. (10.72) and (10.102) [41], The fundamental framework 
of the TAP free energy and its expansion using the advanced mean-field method has 
been clarified [59]. The Bethe free energy functional and the TAP free energy as well 
as loopy belief propagation have been applied to Boltzmann machine learning [53, 
60-64]. Some recent developments appear in Chap. 7 of Part 3 in this book. 

The EM schemes with advanced mean-field methods in the previous sections have 
been applied to noise reduction in probabilistic image processing [30, 51, 65—69]. 
The basic frameworks are based on Eqs. (10.14) and (10.15) with the two-body and 


2 
second-order term (тт) with respect to an infinitesimal LT and by substituting it 
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Bayesian Noise Reductions by Generalized Sparse Prior 


1 p 
P(s, sz, Sy||p, e, p) = Е Е ехр (55 = Sc )o <р< 2) 


(Lj]EE cE{R,G,B} 
1 2 
P(dy,dz,--,diy\|51,52,--, Si B) & П П exp (-5A(dic = Sie) ) 
iev ce{R,GB) | 
P(s1, 52, Syy\|d1, dz, diy, p, æ, В) e m. 
« P(ds, dz, , divi|S1 52, S B)P (S 52, Sy\|P, a) B: Blue 
State Vector of Parameters s = (5,,52,*-,Sy]) Si = (Siz Sie SiB) 


State Vector of Data Point. d = (dj,d2,---, dyyj) d, = (dir dic dig) a = (ar Qc, ag) 


Deterministic Equations of Hyperparameters o and В 


1 2 
тт, 2, 2, le sid Puis) = тт У У, Dd. Iste- Sel” Puis, syl p 8.8) 
ЛЕЕ s, en s, en ЛЕЕ s, en s, en 


- aD Y Ps (Sic — dic) Pic(Sic |d, p. &, B) 


{ЄЎ ce(R.GB) sen 
EM Procedure: Repeat E step and M step until a(f) and с(ї) converge 
[E step] Compute 
1 
utt 1) -Ej У > Isic — Spel’ Pu (sie Sjeld, p, a (t), Bt) (vc € {R, G,B}) 
(ЛЕЕ s, cQ s, en 


1 2 -i 
A(t +1) = (my Y У (re dee)” Prel |а, 00) 
{ЄЎ cE{R,G,B) s, ccn 
1 
[М step] Solve ii 2 X > Isic — Spel” Pus (sie Sjelp alt + 1) ) = uc(t + 1)(ve € {R, G,B}) 
(ЛЕЕ s, en s, сей 
with respect to a(t + 1). 


[MPM step] Update t — t + 1 and $,,(4,р) = arg max P, (si. |4,р,@,В) (vi € V, vc € {R, G, B}) 


Prc(Sic |d.p. a. В) = 5, PX > » > >. Y б, ou Pri T2, Ty) |, d2., div p.a, В) Compute 
n 


TiREM TENT REM Ty REN ry CEM ty ge the marginals 


Pijel(Ste Sie|d, p.a, В) = У У У und Y 5; bY б, баон Prio Tz Tia, d2. =, diyy P.a, В) by loopy 


belief 
TIREN TENT BED Ty REM ry GEN ty gen 3 


~ " M ^ E propagation 
Pa d(sie Sielp. a) = > х У = Pi > >: 8 6, onus on Pri Tz Tilp. a) in each step 


UTC GENT BEN Ty REA ту GEN туей 


Fig. 10.1 Fundamental framework of Bayesian noise reduction by generalized sparse prior and 
additive white Gaussian noise 


one-body posterior marginal probability distributions in Eqs. (10.11) and (10.12) as 
well as the two-body prior marginal probability distribution in Eq. (10.13). They can 
be computed by means of the message passing algorithms in Eqs. (10.91) and (10.92) 
with Eqs. (10.93), (10.94), (10.95), and (10.96) for the Ising model in Eqs. (10.47), 
(10.48), and (10.50) with the prior and posterior probability distributions in Eqs. 
(10.2) and (10.3), respectively. The framework and some numerical experimental 
results are shown in Figs. 10.1 and 10.2, respectively. Moreover, the loopy belief 
propagation is applicable to Bayesian image segmentation in the framework of Sect. 
10.2.3 [70]. They are also useful for community detection be means of the stochastic 
block model for modular networks [71—73]. 
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Bayesian Noise Reductions by 
Generalized Sparse Priors and Loopy Belief Propagation 


RGB Color image with 256 grades K. Tanaka, M. Yasuda and 
Original Image S Degraded Image d E PR pu ash 


Derivative of Additive White 
Gaussian Noise 71-40? 
Red: 4. 02 (dB) 
SNR4 Green: 2.52 (dB) 
Blue: 3.83 (dB) 


Gaussian Graphical 


p=0.3 p=1.0 Mode p=2.0 
Red: 13.09 (dB) Red: 12.07 (dB) Red: 11.28 (dB) 
SNR4 Green: 12.07 (dB) SNR4 Green: 11.09 (dB) SNR Green:10.43 (dB) 
Blue: 12.77 (dB) Blue:11.83(dB) Blue:11.10(dB) 


Fig. 10.2 Numerical experiments in Bayesian noise reduction by the generalized sparse prior and 
additive white Gaussian noise 


10.3.3 Free Energy Landscapes and Phase Transitions 
in the Thermodynamic Limit 


In this section, we consider the Ising model defined by 


J h 1 1 

P , = Н ; 10.109 
( КТ ar) Ac ayer о) ( ) 

КТ” КТ 

where 

H(s) = H (s1, 5, +++, Svp) 

=—/ M oss; hys (J > 0), (10.110) 
{i j}EE ieV 


(= zr)" 3937 У о(- но), (10.111) 


51, EQS2EQ sivje 


The energy function in Eq. (10.110) corresponds to the one in Eq. (10.49) for the 
case of d; = 1 for every node i (€ V). In the present section, we consider a regular of 
degree 4 that includes a square grid graph with periodic boundary conditions along 
the x- and y-direction as shown in Fig. 10.3. 
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Fig. 10.3 Square grid graph 

(V, E) with periodic 

boundary conditions along 

the x- and y-direction in the 

case V = 

{1, 2, 3,4, 5, 6, 7, 8,9, 10, 11, 12} 


For the Ising model in Eq. (10.110) and its partition function in Eq. (10.50), we 
have the free energy per node 


1 
J,h,T) = —kgTx lim ——In(Z), 10.112 
f( ) B Ас Л; n(Z) ( ) 


the internal energy for zero external field 
д 


1 
(a) art h=0, r) 


1 J h 
= n —55)P lom pz =), 10.113 
viuis) 2. м sisj) (s Т’ kgT ) ( ) 


i, j}€E s 


Ju 


and the spontaneous magnetization 


: д 1 
т+ = (а) (peru, h, г)) 
КТ 


1 J h 
Saa ia ee P Сорт ‚ 10.114 
iy fim rao 7 (s ЊТ ur) ( ) 


as important statistical quantities in the thermodynamic limit |V|—> + со. The exis- 
tence of the thermodynamic limit |V|— + oo means that the limit of the right-hand 
side in Eq. (10.110) converges. Sufficient conditions for the existence of the thermo- 
dynamic limit of the Ising model of Eqs. (10.109), (10.110), and (10.111) have been 


given by Ruelle in Ref. [38]. 
In the thermodynamic limit |V|— + co for the Ising model in Eq. (10.110) on a 
square grid graph with periodic boundary conditions along the x- and y-direction as 
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shown in Fig. 10.3, 


5 2sinh( 24.) V J 
E coth( A ) 14 (оа ( 2/ ) Ju 1 (ir) sini) | | ae |, 
J kgT kgT T 0 cosh?( 2J ) 


(10.115) 
J h 
m, = lim lim sis;P (s —. — = 0 
[лае j;|>+00|V|>+00 s kgT kgT 
(0) (к> < }aresinh(1)) 
, (10.116) 


1 
(1 E sin *(27.))' (5 > larcsinh(1)) 


where r; is the position vector of each node i (€ V) [34, 74, 75]. In Eq. (10.116), the 
spontaneous magnetizations m, and m.. correspond to each branch of т} —0 and 
m_<0, respectively. They are as shown in Fig. 10.4. Note that for the Ising model 
in Eq. (10.110) on such regular graphs, 


J h J h 
i ea лк = iP = a a s 10.117 
"(s a) 22 (s Т zr) ны 


for every i(€V), does not depend on i but can be expressed as my (2, d). 
Inthe mean-field approximation of the previous subsection, the spontaneous mag- 


netizations 


J h 
тұ = lim lim my| —,-——]. (10.118) 
h—>+0|V|— +00 КвТ КТ 


are given as solutions of the following mean-field equation: 


ms, = ея + эт) (i50), (10.119) 
kg T 


and the internal energy Ju in Eq. (10.113) in the mean-field approximation is given 
as 


u = —m4?. (10.120) 


The solutions of Eq. (10.119) correspond to the extremum values of the following 
mean-field free energy: 
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Onsager Solution for Ising Model 
on Square Grid Graph Ум with Periodic Boundary Conditions 


rides (TL) ene) четт 
= vite ET, PA His sıs;)P ( rr mT = 0) m, dim img 2 sel sr Ет) 


1 


Phase Transition Point 0.5 


J _1 
D» du з arcsinh(1) 0 
Qu 


Ju:Internal Energy 


Е - i J h 
т. = lim, Jim wi 2, 25? (S| esr Kg?) 
m.,,m. :Spontaneous Magnetization ы 122 | lios 


Fig. 10.4 Internal energy Ju in Eq. (10.113) and magnetization m+ in Eq. (10.114) in the Onsager 
solution in the Ising model of Eqs. (10.48) and (10.50) with Eq. (10.110) on the square grid graph 
(V, E) with the periodic boundary conditions along the x- and y-direction 


1 
fwrGn) = уе" m2, · ту) 


= —2Jm + r(50 + m))in( 5(1 + n) 
+ kat (5(1 Е m))in(5( - п). (10.121) 


which corresponds to => ЕЛ Fr (т\, m»,- ту) in Eq. (10.75) for h = 0. The spon- 
taneous magnetization T and the deme energy u for h = 0 are computed by 


setting 0 < Ра < 1075 and using the iteration method for Eq. (10.119) numeri- 
cally. The graphs of (кт T и) апа (sr т» Ms 


graphs of (m. 2 dr fun) for = = 0.20, 0.25, and 0.40 are shown in Fig. 10.6. 


Itis known that the mean-field йш always has the trivial solution m4 = 0, and 
begins to have some non-trivial solutions for m4 > 0 and m_ < 0. The mean-field 
equation (10.119) begins to have some non-trivial solutions in the region of ET > 1 
by expanding the right-hand side of Eq. (10.119) around т = 0 and keeping the 


first-order term of m. 


are shown in Fig. 10.5. Moreover, the 


Cy 
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Mean Field Approximation for Ising Model 
on Square Grid Graph (V,E) with Periodic Boundary Conditions 


" (| ETE) * (е (ar) (> ехр =) si E Q = (*1,-1) 


m, = jim, lim, WP (sr ier) 


Phase Transition Point 


= Global Minimum State (Stable) 


for Mean Field Free Energy 
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І. һә +0: lim (Р;(+1),Р;(–1)) > (1,0) 
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Pis) = „im Ў, 2 et >. x ky er > P (s| ap ger) (vse Q) 
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Fig. 10.5 Internal energy и from Eq. (10.113) and magnetization m+ from Eq. (10.118) іп mean- 
field approximation for the Ising model in Eqs. (10.48), (10.50), and (10.110) on the regular graph 
(V, E) of degree 4 


Fig. 10.6 Free energy from Eq. (10.121) in mean-field approximation for the Ding model in Eqs. 


h 
(10. т (10.50) and (10.110) on the regular graph (V, Е) of degree 4. a ET 4 = 0.2, E = 0. 


b ру = 0.25, Ap =0. e рт = 04, рр =0 
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Next, we consider the Bethe approximation for the Ising model in Eqs. (10.48), 
(10.50), and (10.110) on the regular graph (V, Е) of degree 4. In this case, the average 
m;, the correlation c;; and the messages ju;-, ; (+1) and uj... ;(—1) do not depend on 
і and j, and can be expressed as m, c, w(+1) and u(—1). We now introduce 


1 u(+1) 
hatin 7). (10.122) 


A 


The message passing equations in Eqs. (10.95) and (10.96) and the magnetization 


are reduced to 
A J h+3A 
—— = arctanh| tanh| —— }tanh i : (10.123) 
kg T kg T kg T 


Moreover, since the marginal probabilities R; (+1) and В, (— 1) аге also independent 
of i, we can derive the expression for the magnetization in terms of A as follows: 


J h 1 Ра h AN 
ethe , = iR(s;) | = tanh . (10.124 
Е (az _ E(X 2 aan ( Т е 


ieV \s;EQ 


For the infinitesimal small limits of Л, such that h—> + 0 and h— — 0, the magneti- 
zation meane z, tr) in Eq. (10.124) can be computed numerically by using the 
iteration method. Moreover, Eqs. (10.104) and (10.108) can be reduced to 


1 | 
fBethe(™, с) = тут вете (Ini liev}, (en jy 


= —2Jc— hm 


-эвт( 5 *))n(5( m) мат(50 п)) „(50 m)) 


4 + 2m + c))in( 5 + 2m + 2) 


| 
+ат(( c))in( ZC 2) 
(; 


(1-209) n(20 2m 9). (10.125) 


(i, ЛЄЕ}) 


and 


J (1+ c? — Am? 
=] 10.126 
E ub ( ) 


such that 
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c= : 1 1- (1 2m?yranh? ( 7.) - inam 27. cau г). (10.127) 
tanh( 2% ) kgT kgT 
КВТ 


The graphs of 
1 1 2J 2J 
(^ тт feine (m. tann( 2) (1 ү (1 onu? ( 27.) а 2) } 
(10.128) 
for Ыт = 0.25, = т = = arctanh(1 ), апа Bf - = 0.40 in the case of h = 0 are shown in 


Figs. 10.7, 10.8, and 10. 9, respectively. Figure 10.7 shows the internal energy u from 
Eq. (10.113) and the spontaneous magnetization m+ in Eq. (10.118) in loopy belief 
propagation (Bethe approximation) for the Ising model in Eqs. (10.48), (10.50), and 
(10.110) on the regular graph (V, Е) of degree 4. These quantities u and m+ are 
obtained by 


6^ 2J 
J h cosh( £4) — exp(—24) 
UBethe КТ? Т = 0] = E > E (10.129) 
B B cosh( £4) + exp( 24) 
d l 0 tanh = (10.130) 
твеһе{ =, —— = 0) = tanh{ —- |, : 
Ет ат ЕТ 
where 
A J 3A 
—— = arctanh| tanh tanh (10.131) 
kg T kf kgT 


These always give the same results as in Eq. (10.110) on the regular graph (V, E) 
of degree 4. In particular, it is known that the results for Eqs. (10.48), (10.50), and 
(10.110) on the regular tree graph (V, E) of degree 4 are exact. It is known that Eq. 
(10.123) always has the trivial solution A = 0, but begins to have some non-trivial 
solutions in the region of т та TJ? arctanh( + ) by expanding the right-hand side of 
Eq. (10.123) around A = 0 ша keeping the first-order term of A. In Fig. 10.7, the 
blue curves correspond to global minimum states that are stable states and the red 
lines correspond to the local maximum state that are unstable states for each value 
of ET in the Bethe free energy fpethe(m, с) of Eq. (10.125) for the case of Л = 0. 
The Bethe free energy landscapes fgetne (m, с) of Eq. (10.125) in the case of h = 0 
for several values of ЫТ are shown in Figs. 10.8 and 10.9. It is known that Eq. 
(10.123) always has the trivial solution A = 0, but begins to have some non-trivial 
solutions in the region of ET - arctanh(+) by expanding the right-hand side of Eq. 
(10.123) around A = 0 and keeping the first-order term of A. In Fig. 10.7, the blue 
curves correspond to global minimum states that are stable states and the red lines 
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Loopy Belief Propagation for Ising Model 
on Regular Graph (V,E) with Degree 4 


Fire (П rs) (Zz) seee 


(ЕЕ 
"E ш, npn sre т) 


га = arctanh (5) 
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Fig. 10.7 Internal energy и from Eq. (10.113) and magnetization m from Eq. (10.118) in loopy 
belief propagation (Bethe approximation) for the Ising model in Eqs. (10.48), (10.50), and (10.110) 
on the regular graph (V, Е) of degree 4 


correspond to the local maximum state that are unstable states for each value of ET 
in the Bethe free energy fgetne(m, c) in Eq. (10.125) for the case of h = 0. The Bethe 
free Виру landscapes јвеһе(т, с) in Eq. (10.125) in the case of h = О for several 
values of > are shown in Figs. 10.8 and 10.9. 

Now se Tem е |€2]-state Potts model [33] given by 


Pls J ho hı T hioi-1 
kgT' ЕТ Т ° kgT 


ПЕ «) (ПП=( 8, 3 (10.132) 


N|— 
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(c) 
-0.75 
ET -0.70 
Гааље (m, с) /веһе(т, с) ысы) 
Ket x 
os T Jam 7 әз 
= 40 -0.85 
"e ET] 
Br /вее(т, c) 
LÁ - Гвете(т, с) Ka? 
pu 075 ЮТ 
]" * 40.80 -0.85 


Fig. 10.8 Bethe free energy for fBethe(m, c) for the Ising model in Eqs. (10.48), (10.50), and 
(10.110) on the regular graph (V, E) of degree 4. a ur = 0.25, n =0.b m = arctanh( 1), 


h J h 
вт = 0.6 gr = 0.4, т = 0 


b 
ч 


5 
a 


feethe(m, &(m)) 
КТ 


ё 


é(m) = "E = l1- (1 – 2m?) tanh? 
tanh x) 


Fig.10.9 Bethe free energy Extremum fgethe (mM, c) for the Ising model of Eqs. P 48), and (10.50) 
c 


with Eqs. (10.110) on the regular graph (V, E) with degree 4. a Ет = 0.25, s i =0.b HT 
IY h J 
arctanh( 3), BT =0.c GT = 0.4 


‚ыт =0 


ze y ey (IL Gs ) (n [Teo( 2 às. J) (10.133) 


SIESSEN  syyjeQ ieVneg 


where €? = (0, 1, 2,---, || — 1}. By similar arguments to those for Eqs. (10.72) 
and (10.102), the marginal probabilities R;(s;) and R;;(s;, sj) can be expressed as 
orthonormal expansions as follows: 


Rin = (5 a) + у) nos), (10.134) 


keQ\{0} 


196 K. Tanaka 


"m p 
Rij i sj) = (5) * (ax) у mP a (5;) 


keQ\ {0} 
1 
* (ax) E most Y; Y Kols), (10.135) 
leQ\ {0} kEQ\{O}EQ\ {0} 


where {®x(s;)|s;€Q, KEQ} is the set of orthonormal polynomials satisfying the 
following relationships: 


1 
Po(s;) = (=). (10.136) 
У Фив) Ф(в) = bg) (кЄ9, LEQ). (10.137) 


s;eQ 


Because it is valid that 


УУ Фив) Ф165), = Y Px G)09iG) = à (KEQ, LEQ), (10.138) 


SES EQ s;eQ 


we have the following orthonormal expansion of бу, ;,: 


биз) = УУУУ Ф) Ф), | Px (6) Физ) 


keQleQ Ns {Є Оз, EQ 


= J Фик) Pe (s;) (sh EQ, 8; EQ). (10.139) 
keQ 


By using Eqs. (10.134) and (10.135) and the orthonormal expansion of the two-body 
interaction part of the Potts model, the Bethe free energy functional for the Potts 
model in Eqs. (10.132) and (10.133) can be reduced to 


Жм | {RilieV}, (8,10, NEE} | 
= Foaie( [mf ie V. keQ\(0}}, [etin |t. ЛЕЕ, k, Ie (0) ).(10.140) 


where 
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Fass ( mf" iev, keQN(0)). {efi} |{i, ЛЄЕ, k, 1eQ\{0}}) 


=D) - PAL A du eJ» X qn 


ieV | cien to) (i. j)eEkeQN(0) 


+kgT Y (1 — |92) Ri GoIn(R; Gi) 


ieV 
+&вТ УХ Rij(si, spIn(RijGo s;)). (10.141) 
{i,jJEE 
For the case of spatially uniformity, m and Р D are independent of i and (i, j} 


and can be represented by т and c | M in the Bethe free energy in 
Eq. (10.141). For the three-state and four-state Potts model, the Bethe free energy in 
Eq. (10.141) can be represented by 


wD. fcD opum 
Feethe m2 Peed 0.2) )), 


(10.142) 
and 
m® CD c02 (15) 
Fpethe | | mO |, | c@D c9» сэ) |], (10.143) 
NS cO 682) (03,3) 


respectively. Figures 10.10 and 10.11 show the internal energy with no external fields 


J ho hy hioi-i 
р = 0, = 0,.-:, = 0 |, 
= vil ла, 2, 29) p(s BT BT Т КВТ 


(jleE s 
(10.144) 


in loopy belief propagation (Bethe approximation) on the regular graph (V, E) of 
degree 4. We now consider also the moments m? and m! as order parameters 
for the three-state and four-state Potts model, respectively, for the following cases: 
(D lim m®, hy = № = 0, 
ho>+0 
(ID lim m®, ho = ho = 0, 
hi +0 


(10.145) 
(Ш) „бт то), ho =h; = 0, 
2c 


(IV) m® under ho = hy = hz = 0 and (0) = (1) = 42) = 1, 


for the three-state Potts model, and 
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(D lim m® hı = hy = h3 — 0, 
ho—--0 
(П) lim m®, ho = hz = h3 = 0, 
hi—40 
(Ш) lim m ло = а =h3 =0, (10.146) 
hy—>+0 
(IV) lim m®, hg = hy = № = 0, 
ha3—4-0 


(V) m® under ho = hy = һә = 0 and (0) = и(1) = 4) = HG) = 1, 


for the four-state Potts model. These are also shown in Figs. 10.10 and 10.11. In 
Figs. 10.10 and 10.11, blue, green, and red lines show the global minimum states, 
local minimum states, and local maximum states, respectively, of the Bethe free 
energies which are given by Eq. (10.142) for the three-state Potts model and by Eq. 
(10.143) for the four-state Potts model. In the global minimum states, there exist 
discontinuous points in т) and m! as well as u. Although the first derivative Ju 
of the free energy with respect to ЫТ is always continuous, the second derivative 
diverges or has discontinuity in the Ising model as shown in Figs. 10.4, 10.5, and 
10.7. This kind of singularity is referred to as a second-order phase transition in 
statistical mechanics. However, the first derivative Ju of the free energy with respect 
to ce has a discontinuity as shown in Figs. 10.10 and 10.11. This singularity is 
referred to as a first-order phase transition in statistical mechanics. Figures 10.12 


and 10.13 show the Bethe free energy landscapes 


1 m O co D 012) 
MO S 
foem (m^, m?) = саш (Е {сор cam |}. 


IV | 
(“2 ca») 


for the three-state Potts model and 


| т® ү үс qiia gb 
Ља (m 9, m®) = — extremum Feethe | | m? |, | 20 c@?) cd ; 
М (2n c2 3) mO J X00 202 6D 
mO, 


(10.147) 


cD 62.2) c23) 
cB) 6922) c63) 


(10.148) 


for the four-state Potts model, respectively. 


10.3.4 Ising Model оп a Complete Graph 


This section considers a complete graph (V, E) for which the energy function H (s) 
is defined by 
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Loopy Belief Propagation for Three State Potts Model 


on Regular Graph (V,E) with Degree 4 s, € Q = {0,1,2} 
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Fig. 10.10 Internal energy и in Eq. (10.144) and (I) , lim P for hy = hy = 0, (ID А lim 0"? 
o 12H 
for ho = №2 = 0, (Ш) | lim m® for ho = һу = 0, (IV) т) under ho = hy = h2 = 0 and 


по > +0 
u0) = (1) = ш(2) = 1 such that Р; (0) = Р; (1) = Р; (2) in loopy belief propagation (Bethe 
approximation) for the three-state Potts model in Eqs. (10.132) and (10.133) оп the regular graph 
(V, E) of degree 4 


У` sis; - у з (J > 0), (10.149) 


(i. j)eE ieV 


J 
H(s) = Н (51, 52, "+, Sy) = ттт 
14 [V] 
instead of Eq. (10.110) in Eqs. (10.109) and (10.111). Note that the interaction 
between every pair of connected nodes is set to Wi to guarantee the existence of the 
thermodynamic limit in |V|—> + оо for the complete graph in the sense of Ruelle in 
Ref. [38]. 
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Loopy Belief Propagation for Four State Potts Model 
on Regular Graph (V,E) with Degree 4 s; E Q = {0,1,2,3} 


digia ы ; s») Lema T =) 


пєй 


уена) crm a det) 


11 
сар 9(1), @9(2), Физ)) = (5.3.2.5) 


1 
avs" E USC ET 
(Фу). Фу), Фу), ®уз)) = (7. 


) (ө0).ә01).902.%3)) = (2.7.25) 
i. 
2482/8! 245 


nu wite TET, PA ae 8.) P(r rr nr er) mo = yim, my өөр (lerer apr err) 


(,(0),.0,(1),6,(2),9,(3)) = Ea 


— Global Minimum State (Stable) 

Local Minimum State (Metastable) for Bethe Free Energy 

— Local Maximum State (Unstable) m®\ ус!) cD c03) 
NT ETT 


mË |, (cD с22) c2» 
m®)/ \єзл) (G2 c83) 


Lho > +0, h; = hz = h; = 0: Tum (P,(0), P,(1), P:(2), P,(3)) = (1,0,0,0) 


IL hy > +0, họ = hz = h; = E (P,(0), P((4), P2), P(3)) = (0,1,0,0), у. (P,(0), P,(1), P.(2), P.(3)) 
+o 4134 


' = (Pra "r7 9) 
Ill. hz > 40, hog = hy = Аз = 0: lim (P;,(0), P,(1), P,(2), P,(3)) = (0,0,1,0) 4'4'44/ КТ 
dre 
IV.h; > +0, họ = hy = hz = ge (P;(0), P,(1), P,(2), P(3)) = (0, 0,0, 1) 


P,(s,) = үз > " У x X е У P(s| 25 2A ML кї) (vi e V, vse 9) 


5160 5260 5160 51+1Є0 51.260  syjeQ 


Fig. 10.11 Internal energy и in Eq. (10.144) and order parameter m! in loopy belief propagation 
(Bethe approximation) for the four-state Potts model in Eqs. (10.132) and (10.133) on the regular 
graph (V, Е) of degree 4 
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озм 


mo 


Fig. 10.12 Bethe free energy for fBethe Gn (D. m OQ) for the three-state Potts model in Eqs. (10.132) 
and (10.133) on the regular graph (V, E) of degree 4. a ET — 0.850. b Hr — 0.880. c ET = 


0.881. d HT = 0.882. e ET = 0.885. f ET — 0.920 
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Fig. 10.13 Bethe free energy for Bethe (m, т) for the four-state Potts model in Eqs. (10.132) 
and (10.133) on the regular graph (V, E) of degree 4. a HT — 0.900. b HT = 1.000. c {= 


kgT 
1.010. d рт = 1.020. e рт = 1.050. f рт = 1.100 
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The free energy in Eq. (10.55) is expressed as follows: 
Ы =) 
КвТ kgT 


TIENER?  mnyjeQ {,ЛЄЕ 


рас well din (4 vit") |} 


TEQIIEQ тує 
(10.150) 


By using the Gauss integral formula 


+f СЕХ + ах) = exp(52°), (10.151) 


the expression for the free energy is rewritten as 


J h 
= ВТ Z| ——, — 
КТ КВТ 


-A(Z E 2 [е 
тє97є9 тує 2л 


li mn A oe |7 Nha (10.152) 
2 Ma P E ni COS! Viet | kar 2! " 


Note that the procedure in which a new continuous variable x is introduced in Eq. 
(10.152) is referred to as a Hubbard-Stratonovich transformation [13]. Moreover, 


| КвТ 
by replacing the variable x by y = x,/|V| (=) , the free energy can be written as 


J h 1 
тт) ) 52 (X EI (10.153) 


where 


J 
u(y) = (27 =) y+ э? (2ею®\( тк» +m) ). (10.154) 


ieV 
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We now consider the CUM 


J h J h 
"(zr or) „ыу lee ur) i 


ieV s,eQs,eQ Оа? 


Eqs. (10.109) and (10.111) with Eq. (10.149) as follows: 


arr) = взт) Сета) 
[E hmo H mellar (Jy4 н))))е (10.155) 


+50 
| ехр УН (dy 


lim 
IVi >+% 


Because it is valid that 


im SHO) =, m im s (Vo у! Vi СЕ po) 


J 
= "i tanh( Ly +). (10.156) 


we obtain the magnetization as 
1 
( J h ) | ex(ii( vos) + ту ale hae ТЯ innt yas + »)))) 
m А = lim 
ОЗИ oe хр(1И Omax)) 


1 
= tanh (/ ушах 4 »). (10.157) 


where 
ah ИЛЕ, МИ (10.158) 
max — lan max 3 
y ure” 


by using a saddle point method [37]. Equations (10.157) and (10.158) reduce to the 


following mean-field equation for "(27 RI : 


NEL NN Po з © [р (10.159) 
—, —_ = tan —— 7 
ASI P int ЕТЕТ 


This means that it is possible to treat the Ising model on the complete graph in ће 
thermodynamic limit analytically using the mean-field method. 

By combining the replica method with the Hubbard-Stratonovich transformation 
and the saddle point method, it is possible to treat the random average in Eq. (10.25) 
for the Ising model with non-uniform external fields on the complete graph analyti- 
cally [13, 76]. In statistical mechanics, this kind of approach has been developed as 
the spin glass theory [77—80]. Such computational techniques that use the replica 
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method for Ising models with spatially non-uniform interactions and external fields 
on the complete graph have been used to estimate statistical performance analysis 
for many probabilistic information processing systems [13, 15—17]. 

Next, we consider the belief propagation method for the Ising model on the com- 
plete graph in Eq. (10.149) with Eqs. (10.109) and (10.111). For an infinitesimal 
small |V|~', the message passing rule in Eq. (10.95) can be expanded to 


Zi 1 J 
Lai) = - h 
IL ji Gi) ixl П ш D Ze (буо + 2) 


ledj\{i} 


2 у | п = h s ig » 
= Iu jGj) J exp 14 sitj + O(|V| 
203 oco (n {i} | IVI kBT ( ) 
212} 
TI 280 »( T Bris + О(УГ з) 


Zi. осо 


212} 1 J a —2 
= Zan, 14 kaT \ VI У^ (т) Sj +O(IV| ) 
Л TjEQR 


2] | J Pn ls 2 


By substituting Eq. (10.160) into Eq. (10.91), the marginal probabilities сап be 
expressed as follows: 


1 J ~ 
exp КТ ТЛ у, Уу `2; (9) + № Si 


~ JEV\{i}tjEQ 

Ri(si) = - 
У сехр Aur T = 2. Deak; (27) +A |t 
GEQ JEV\{i} tj EQ 


O(|V|!) (V| + оо, s;eQ, ieV), 
(10.161) 


Ri (si, sj) = Ё, (э) R;(s;) + O(|V|7!) (JV| + оо, SEQ, sjeQ, {i, JEE). (10.162) 

Equation (10.161) can be regarded as a system of simultaneous deterministic equa- 
tions for [Ж бо|ке9, iev] and is equivalent to the mean-field equation in Eq. 
(10.68) for Eq. (10.149) with Eqs. (10.109) and (10.111). 
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10.3.5 Probabilistic Segmentation by Potts Prior and Loopy 
Belief Propagation 


In Sect. 10.2.3, we gave the fundamental framework of probabilistic segmentation 
based on the Potts prior, and reduced the framework of the EM procedure for esti- 
mating hyperparameters to the extremum conditions of the Q-function as shown in 
Eqs. (10.41), (10.42), and (10.43) with Eqs. (10.44), (10.45), and (10.46). These 
frameworks can be realized by combining them with the loopy belief propagation in 
Sect. 10.3.2 to give the following practical procedures [70]: 


Probabilistic segmentation algorithm (Input :D, Output :@(D), i(D), @(D), C(D),$(D)) 


Step 1: Input the data vector d and set the initial values of hyperparameters a(D), 
a(D), C(D) and messages in the loopy belief propagation 1325 D)|icV, 
j€8i, s;€Q} for the posterior probability distribution. We set? «— 0 as the number 
of iterations of the EM procedure. " 

Step 2 (E-step)  Sett < + Land updateu(D), a(D), C(D), {R j>; (si, D)|s;ieQ, 
іє}, jeði} using the following procedures: 


Y exp(2@(D)65,,r,)g(dj|tj,@(tj, Р), Cj, D) [| 2-0. D) 


ТО reg keàjNi) 
1 Y JO epa) eldi |ti, @(ту, Р), ©су. D)) [| 0. D) 
rt EQT; EQ keaj\{i} 
(s;EQ,i€V, jedi), (10.163) 
Peis D) < IL ji Gi) (sjie&, TEV, j€8i), (10.164) 


B; <= X s(di|i. ac. 0), ё, D)) | [Aritti D) GeV), (10.165) 


GER keði 


віл < У, b П йкы, D))a(di|ni Gi. Р), Сат, D)) 
тєтє Кєдї\{/} 


хехр(2@(Р)8; « i) 


хв(4у|ту.®(ту, Р), бту, D))( П #30. D)) (di, jJ € E), 
kedj\{i} 
(10.166) 
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YF tie (di|si, a6, D), Со, р) (Па. D) 


iev ^! keði 
a(si) < 1 (s;€Q), 
yy 8(@ |н, аби, D), Со. D) (T [Rii D)) 
iev ^! keði 
(10.167) 


1 2. ae » ^ 7 
YF 4i - Ao, D))(di – alsi, D)) e(d; |si, Gi, D), 66. D) (Па. р)) 
C) P ієү keði (s;i€Q), 
Y. le(s a(s;, D) C; р)( П i (Si р)) 
B; (4 [ES ГЕЈ , [EI >t 1, 


iev ' keði 


(10.168) 


1 


( — ôr 1; )exp( 28 (D)5,, .,) 


2. 1 
Фф) У ( 


(ЛЕЕ Bua ENTER 


x( [| йыт. р))в(&|т,@(т, D). С, р) 


keði\ {j} 
a П Шеър D))e(d;|r;. aG;. D), C(r;, р))). 
keaj\{i} 
(10.169) 
а(5:, D) < a(si) (sie), (10.170) 
С(5;, D) < С(5;) (sje). (10.171) 


Неге, g(di E a(&, D), С(Е, р)) is defined by Eq. (10.31) for each state £ (€Q). 

Step 3 (M-step): Set the initial values of the messages (AE )|& €Q} in the loopy 
belief propagation for the Potts prior and repeat the following procedure until 
@(а) and (A(£)|£eQ) converge: 


Ў exp(2@(D)ôs; 1, JAC, р)? 


rjeQ 


У Y exp(28(D)5,, r JAC, р)? 


тє0т;є0 


(si) < (s;€Q), (10.172) 
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À(s;, d) <= A(s;) (s; €&2), (10.173) 


Y У (1-а), DPexp(280)5s су Jj, D 


m Er 1 ENT; EQ 174 
(Р) < aD) x( = = = 
1+a(D) Y Y. Dyexp(2@(D)5xj,1; Xj. D» 
TENT; EQ 
(10.174) 


Step 4 Compute the output $(D) = (Sı (D), $2 (D), - - -, Sivi (D)) as follows: 


dilsi, (si, D), (si, D) ] ài. D) GeV). (10.175) 


$i (D) < argmax «( 
sie 
keði 


Stop if the hyperparameters @(D), a(s;, D) (s;€Q), and C(s;, D) (s;€Q) converge 
and return to Step 2 otherwise. 


Some of the numerical experimental results are shown in Fig. 10.14. The Potts 
prior has the first-order phase transition as shown in Sect. 10.3.6. Figure 10.14 shows 


how the hyperparameter 20 = LT converges in the EM procedure with loopy belief 


propagation under the first-order phase transition. 


10.3.6 Real-Space Renormalization Group Method and 
Sublinear Modeling of Statistical Machine Learning 


First, we explore the most fundamental real-space renormalization procedure for the 
Ising model in Eq. (10.49) on the ring graph (V, E), where 


Е = fu, 2}, 2,3, (3, 4, --, {VI — 1, IV] avi, n}, (10.176) 


in the case of |V| = 2^. We have the following equality: 


УУУУ Dez) 


sj€&Qs,eQsge€2 — syjeQ(i, j]]eE 


= (Eee + 5) (5-17 а J (з + эм) 


52EQ 54Є 2 


1 
Xx > ex (oe F мз) 


sive 
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Bayesian Image Segmentations 
by Potts Model and Loopy Belief Propagation 


K. Tanaka et al: 
P: 124002, 2014. Hyperparameter Estimation 
by EM Algorithm 


Four State Potts Model 


Four Labels 


Eight Labels 


Fig. 10.14 Numerical experimental results of probabilistic segmentations by Potts prior and loopy 
belief propagation. The graph (V, E) is asquare grid graph with periodic boundary conditions along 
the x- and y-directions 


1 
b (би + 79 


Г 1 5 (12-5153) 1 3 (12-5555) 
—27|cosh| ——J cosh| — 
kgT КвТ 
1 5 (1+яи-зяи-1) 1 3 (+з) 
x-++x { cosh{ — J cosh| ——J 
kg T kgs T 


vi 1 2 
—27 1 A i -l h| — Ј 
TT e(( + 8214192143) х5 (os (; Т ))) 


i=0 B 
(1+ ) H h( 7 
ex 181) х =In{ cosh| — К 
PO DISPLPSS pe 
(10.177) 
For the Vl dimensional state vector (а, 43, d5, ***, djy—3), djv| 1), the marginal 


probability distribution Pus, vi -34vi- (ai, 3,05, `++, d|V|-3, Q|V|-1 læ) is 
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expressed as 


Pss, Ivi -34v|- 1) (S15 835 $5,775, S|v|-3: уіл) 


"Ж. ч "Я 7A k. 
=) ) ) fece ) ) P (S1, 52, 5з, S4, 85, 865 ** 5 SV|-3, 5|у]|—25|у|—1› 5|у|) 


S2EQSGEQSGEEQ — Siyj-2EQSy yj EQ 


Wi. 
2 


П exp(@ 59541, 5943) exp(@ syyj-1, sı) 
i=0 


= = , (10.178) 
2 
2 
pP у, Ax у, ехр(« 5941, 5243) exp(a siy\-1, 51) 
aj€QaseQ ayy|-1€eQ i=0 
where 
aes ден Fm (10.179) 
TA kgT ` | 
The remaining nodes, which are denoted by odd numbers, are now renumbered by 
replacing i with 1 fori = 1,3,5,---,|V|—3, |V| — 1 and new sets V and Е) 
(1) (D О а) (1) (1) т 
of nodes and edges and a new state vector 5“) = (si ‚52 ›53 188p op ss) 
are introduced as follows: І : 
V V 
y® = АЕА : (10.180) 
2 2 
V V V 
ЕФ = 10,2, 0,316,400 -1, A M, aoa 
2 2 2 
s? = sy 1 = 1,2, |VI/2). (10.182) 
WI di i (1) D (0) (0) (1) at 
For the ^5 -dimentional state vector $^ = (5| , 55 , 83 Iam , we 


define a new renormalized probability distribution by 


П expos sP) 
PO(s®) = me (10.183) 


Sy Уу ll exp(ais(s(?) 


al'eQafPeQ а) egli j1e E 


By repeating the above renormalizing procedures, 
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SEL LT (р 


sf 1) є95 a eost =) EQ SiyG- peti, jyeEC- 1) 


25» П E + 52i 4152143) x om 


i=0 


1 2 
ехр (1 + siiis) x 51n cosh m ad 


а = 


;In(cosh(2a'^-!)) 


V 
Ped е | ). 
2r 


the renormalized probability of the r-th step is generated as follows: 


П exp(a Mss 32) 


PO(s®) = (ЗЛ 


XXX П н) 


( ( r i.e E. 
sO EPER 5, eU. EE 9 


where 


IV] 
or |, 


y? = [12,--., 


(r) — Зр 
(20,38, b Ьуз 


Note that V = у, EO = E,a = р, 80 = s, and PO (s) = 


IV] IVI, „IVI uf. 


211 


a") 


(10.184) 
(10.185) 


(10.186) 


(10.187) 


(10.188) 


(10.189) 


P(s). 
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Equation (10.185) corresponds to the update rule from o" ^P o, By solving Eq. 
(10.185) with respect to 0—1), we can derive the inverse transformation rule of the 
real-space renormalization group procedure as follows: 


а) = ;arccosh(exp(2a/?)). (10.190) 


If the hyperparameter œ“ in the r-th renormalized probability distribution P? (s?) 
has been estimated from given data vectors by means of the EM algorithm for renor- 
malized probabilistic graphical models on ring graphs (V ?, Е“), we can estimate 
the hyperparameter a = a of the probabilistic graphical models (10.49) on ring 
graphs (V, Е) by using the inverse transformation rule of the real-space renormal- 
ization group procedure (10.190). 

Now, we extend the real-space renormalization group scheme for the probabilistic 
graphical model on the ring graph to the square grid graph as a pair approximation 
in the real-space renormalization group framework as follows: 


exp(o ^ s,s3) x Э У `ехр(о"— P Gis; + $553 + 5154 + 5453)). (10.191) 


so€Qs4eQ 
Equation (10.191) can be reduced to 
o? = In(cosh(20"~)). (10.192) 


The r-th renormalized probability distribution for Eq. (10.49) is expressed as 


PO (s®) « П exp(a'”s\s\). (10.193) 


(i, jJeE™ 


The inversion formula in Eq. (10.192) can be derived as 
(7—1) 1 (r) 
a = 5 arccosh (exp(o ). (10.194) 


The above framework can be extended to the |$2]-state Potts model, as shown in 
Fig. 10.15. The inverse renormalization group transformation can also be applied 
to the probabilistic segmentations in Eqs. (10.41), (10.42), and (10.43) with Eqs. 
(10.44), (10.45), and (10.46) in Sect. 10.2.3 [81]. One of the numerical experimental 
results in the inverse renormalization group transformation in probabilistic segmen- 
tations is shown in Fig. 10.16. 
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Real Space Renormalization Group Theory 


In| 1 4 e*« ^? 


Oa ae 
55605,60 101-2 + 2e 
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Fig. 10.15 Fundamental framework of sublinear computational modeling by the inverse renormal- 
ization group transformation in probabilistic segmentations 


RSRG in Bayesian Image Segmentation 


Potts Prior Hyperparameter 
Р(5{,52,°°°,5|у| 


Estimation in 
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Marginal Likelihood 


K. тава. IPSJ, with Belief Propagation 


34, 124002, 2014. for Original Image 


Hyperparameter Estimation in 
Maximization of Marginal Likelihood 
with Belief Propagation after Coarse 
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Belief Propagation 
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Fig. 10.16 Numerical experimental results of sublinear computational modeling in the inverse 
renormalization group transformation in probabilistic segmentations 
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10.4 Quantum Statistical Machine Learning 


This section explores the fundamental frameworks of quantum probabilistic graph- 
ical models based on energy matrices and density matrices. Note that every energy 
matrix needs to be Hermitian and have a density matrix that is defined by all the 
eigenvalues and all the eigenvectors of each energy matrix. If all the off-diagonal 
elements of the density matrix are zero, the diagonal elements correspond to the 
probability distribution in the probabilistic graphical model. First, we explain gen- 
eral frameworks of density matrices and their differentiations and define the min- 
imization of free energies of density matrices. Second, we give the definitions of 
tensor products of matrices as well as vectors. By using Pauli spin matrices as well 
as tensor products, we introduce quantum probabilistic graphical models. Finally, 
we extend the conventional EM algorithm to a quantum expectation-maximization 
(QEM) algorithm. 


10.4.1 Elementary Function and Differentiations 
of Hermitian Matrices 


Before proceeding with the quantum statistical mechanical extension of statistical 
machine learning, we need to explore some essential formulas for Hermitian matrices 
and their derivatives. Some fundamental properties of matrices for statistical infer- 
ence have appeared in Ref. [82]. In the present section, we give some useful formulas 
for treating the entropy in quantum probabilistic graphical models. 

We consider the M x M Hermitian matrix A 


Aj Arm ++: Aim 
A» An +++ Азм 
ash ee (10.195) 


Amı Am +: Amu 


В : —T = 
which satisfies А = А . Here we remark that АТ and А аге the transpose and con- 
jugate matrix of A, respectively. We introduce vertical and horizontal basis vectors 
in the M-dimensional space as follows: 


1 0 0 0 0 
0 1 0 0 0 
0 0 1 0 0 
0 0 0 0 0 

D=]. |, 20) = |. 7, B)=]. Jo IM -D=]. LIM = (10.196) 
0 0 0 0 0 
0 0 0 1 0 


© 
e 
e 
© 
= 
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and 
(1 == (1, 0, 0, 0, ` ‚0, 0, 0), 
(2| = (0, 1, 0, 0, ---, 0, 0, 0), 
(3 = (0, 0, 1, 0, ETS 0, 0, 0), 
(10.197) 
(М – 1 = (0, 0, 0, 0, , 0, 1,0), 
(M| = (0,0, 0, 0, ---,0, 0, 1). 
We can confirm that 
G|A|[j) = Aij (ie€{1, 2, ---, М), etl, 2,-- Мр). (10.198) 
The Hermitian matrix A is diagonalized as 
A=UAU", (10.199) 
A, 0 0-...0 
0 45 O 0 
A=z|00% 0 |. (10.200) 
000... лу 
where all the eigenvalues, A1, А, · · +, Ам, are always real numbers. For the eigen- 
Ui; 
Uni 
vector и; = ; corresponding to the eigenvalue 4;, such that Au; = A;u;, for 
Uyi 
every ic(1, 2, 3, ---, M} the matrix U is defined by 
Uy, Un Ug | Uim 
Uz Un Unz +++ Urm 
U = (wy, uz, из, с, uy) = | Оэ ©з? U33 ^ Usm |. (10.201) 


Umi Um Омз ··· Umm 
It is known that U is a unitary matrix that satisfies U~! = U for any Hermitian 
matrix s. If A; is the maximum eigenvalue, its corresponding eigenvector uy is 


expressed using the following notation: 


uy = argmaxA. (10.202) 
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Note that argmax A is the eigenvector that corresponds to the maximum eigenvalue 
of A. 
For any Hermitian matrix A, the exponential function is defined by 


+00 


1 
exp(A) = 2 AT 
n=0 ` 
exp(A;) 0 0 > 0 
О exp) 0 > 0 
-ul 0 0 expA2:- 0  [u-t (10.203) 
0 0 0  --exp(Ay) 


and In(A) is defined by the inverse function of exp( A) such that 
exp(In(A)) = A. (10.204) 
In the present definition, we have 
exp( AGI) = (exp(A)) 8l, exp(1 8A) = IG(exp(A)), (10.205) 


where I is an identity matrix. 
For |1 -Ai|«1L]|l—23| < 1,---,|1—Aw| < 1, In(A) is defined by 


+00 1 
In(A) = ln(I — (I — A)) = Low A)" 
n=1 


Ij) 0 о... 0 
0 Ina) O0 > 0 
=u} 9 9 209: 0 fyi, (10.206) 
0 dv dV зеб 
By using Eqs. (10.203) and (10.206), we can confirm that 
Too 1 
exp(In(A)) = 5 /— ün(A)" 
n=0 
(In(1))" 0 0 EE 0 
ГЭР O AD” 0 > 0 
-ylpy| 0 0 0803)". 0 |у 
п! : А 7 ; 
n=0 А 


0 0 0... nan" 
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exp(In(A1)) 0 0 ee 0 
0 exp(In(A5)) 0 ee 0 
=U 0 0 exp(In(A3)) «+> 0 U^ 
0 0 0 + exp(In(ay)) 
АО oss 0 
0220... 0 
-u| 990925: 0 |y- =A (10.207) 
0 0 0---ìÀm 
Moreover, we have 
N N 
dod — A)" – T= AY 
n=0 n=0 
=1+(1- 5) +0 - A) ++ (1- А)“ 
(1) – (1 - А) (1—5) — d — А) 
—-I-—(I—4A)", (10.208) 
such that 
N 
ya —-Ay2U-ü-A) (r- a – A)"*"). (10.209) 
n=0 
We have 
(1—Ap)N* 0 0 0 
0 (1—a5) NH 0 ges 0 
АМ -uU 0 0 (eap ese 9 U-1—0 (N> + оо), 
0 0 0 Y a уум! 
(10.210) 
so it is valid that 
+оо 
А1 = (1-0 - А) = У - Ay. (10.211) 
n=0 


Note that exp(A and In(A) as well as A^! are also Hermitian matrices in the present 
case. (This can be shown by using AU = (A) 
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We now introduce a Hermitian matrix function G(x) for any real number x as 
follows: 


Gu) Gra) Gr) --- Gi (x) 
Сэ1(х) Go»(x) Gos(x) --- Сәм(х) 
G(x) = | G31@) Gz) G3) --- Сзм(х) |. (10.212) 


Gan 6) Gurls) Смзбх) «+ Gun) 
We have 
Gij(x) = Сн) (ie(1,2,--- M), je(L,.2, -- Mp), (10.213) 
such that 
GWI) = (ЛС ООП) Get 2, -- M}, je(l2,-- MD. (10214) 
It is obvious that the derivative of the matrix G(x) with respect to x, namely, 
£GuG) Gre) 6130) --- Gin (x) 


2 4-Ga(x) £Gn(x) Gz) £Gou(x) 
907 Gaa) Gaa) £G33(x) c SG3u(x) (10.215) 


A. m. Е. NT. 
7: См1(х) qi См2(х) qs Gua) ++: т Смм(х) 
is also a Hermitian matrix such that 


d d — 
4 6000) = (j| —G(x)|j) Get 2, M}, je{l, 2,---, Mj). 
х ах 
(10.216) 


We have the following equalities: 
d d 
—(Tr[G(x)]) = Tr] — G(x) |, (10.217) 
dx dx 


and 


1 (£60)609) = п(со( воз), (10.218) 
ах ах 


Equation (10.218) can be confirmed as follows: 
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M 
“((ве®)в®) = Dias. _ бо) ) Genii) -yyu(s Law) (IG@)I) 


i=1j=1 


M M 
= »» ЛО (х)|ї) “( боо)! = ОСЕ со). 


(10.219) 


Ву using Eqs. (10.217) and (10.219), we derive the following fundamental for- 
mula 


а afao’) "(eer en(eo(7- G(x)" | 
— n( (4 ИИА 


а 


(фе) (ew) 


Few) @ = new") шей 

ах ах 

| ) m) (( ; ) 2 

L бо) )(- nGay" | + Trl ( 609 Go) 

dx dx 

E ((7:60)ieer-). (10.220) 
dx 


From Eqs. (10.217) and (10.220), we can confirm the following equality: 
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а 060) = а and — d — G(x))) 
dx dx 


d c | 
ubi у-и G(x)) 


n=1 


+оо 


ld 
= —-)°-—Tr((I — G(x))") 


n dx 


Td (( d ) =) 
= у, Тг (I — G(x) |n — G(x)) 
n dx 


d Too v 

= n((-Zu = Gc») (Хи — G(x)) | 

d = п—1 
= Тг (60) 2 e eu) 

(69) 7) 

= т( (60) 0-01 - GG» |. 

ах 
— trl (сво \ во 10.221 
= (C w) (x) ) (10.221) 


By using Eqs. (10.221), we can confirm the following equality: 


zL-Tr[AIn(A)]. 7Tr[Aln(A)] --- „4—Тг[Аш(А)] 


7 Tr[Aln(A)] -“Tr[Aln(A)] --- -“Tr[Aln(A)] 
L TAA) = ПАЯ B n TA - n . ТАЗУ Ё п 
ТАЊА) _{—Tr{ Aln(A)] кёз a - Tr[AIn( AD] 
= In(A) + I. (10.222) 


10.4.2  Minimization of Free Energy Functionals for Density 
Matrices 


T . : =. К 
For any M x M Hermitian matrix H that satisfies Н = H , ће free energy functional 
for an M x M trial density matrix 


Кур Roco Rim 
Ro, Rz + Ком 


(10.223) 


Ку Rm +: Rum 
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is defined by 
FUR] = Tr[R(H + kgTIn(R))]. (10.224) 


The density matrix P is determined so as to satisfy the following conditional mini- 
mization with the normalization condition as follows: 


P= argmin{ F[R]|Tr[R] = 1}, (10.225) 
and this reduces to 
P l ! H (10.226) 
= —exp| ——— А Я 
z^" ыт 
2 = Т : H (10.227) 
= Tr| exp{ — — | ; 
P kak 


First, we introduce the Lagrange multiplier л to ensure the normalization condition 
as follows: 


LIR] = FIR] — X(Tr[R] — 1). (10.228) 


R are determined so as to satisfy the following extremum condition: 


д 


zp AIR] = 0 (т 21,2... M, m'—-1,2,—.,M). (10.229) 


Finally, by determining А so as to satisfy the normalization condition TfR] = 1, 


Eqs. (10.226) and (10.227) can be derived. 
Because the energy matrix H is a Hermitian matrix, all the eigenvalues h,, are 


ү (т) (1) 
y") 
always real numbers and all the eigenvectors . can be chosen as real 
y (M) 
vectors and are defined by 
y" (1) y" (1) 
ye) y) 
| = h™ | (m = 1,2,---,M), (10.230) 
y (M) y (M) 


where 
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very 
w™ (2) 
(# а), у 2), ..., WO.) А = 1 (т = 1,2, :.:, М). (10.231) 


y 0 (M) 


By using these eigenvalues and eigenvectors of H, the density matrix can be 
expressed as 


WO) wd) -.- wd p? 0 -. 0 WY) wd) ... ya) Е 
. wh 2) 0) 02) ... wy) 0 pO... 0 YDA wa)... y0D(2) 
R= Р С : Rod oq à LN | 
YOM) YOM) ars yD) 0 0 — ро) y O (M) YOM) ess y 0D (M) 
(10.232) 
where 
exp(- in) 
р“) (т = 1,2,---, M). (10.233) 
Tr[exp(— кл) 
y») 
yoQ) 
This means that the probability of each state . is p™ for m =1,2,---,M. 
y (M) 


10.4.3 Tensor Products 


This section explores tensor products (Kronecker products) [82]. Tensor products 
include some fundamental mathematical concepts for achieving quantum statistical 


mechanical extensions of probabilistic graphical models. 
We introduce tensor products for matrices and vectors by the following defini- 
tions: 


Au( gu | An( gu 2 
es naE M 25) = Bo, B») B5, B22 
Аз 422 Bj B» А (21 A А (в 2 
B21 B22 B», B22 
A11B11 Ап Во A12B11 A12B12 
A11B21 A11 B22 A12B21 A12B22 ‚ (10.234) 


A21 By, A21Bi2 A22B11 A12B12 
Аз Bo; Аз] B22 A22B21 Ai2 B22 
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A( 5 Ai By 

Ai BY By _ | An Bai 

(eG = үзүр рлык; (10.235) 
(В, Аз B21 


We remark that 
(ee A12 9 By 25) Сп SE Dii Dio 
Азу An Bo, B»; Cr С» Dy D» 
Ап Ар \ Си С By Biz A Dii Рр 
2 . (10.236 
(Ge Fo Ce ett pe 2:)) ( ) 


Moreover, for the following general matrices A and B, 


Ay An c Aim By Вр +++ Bin 
А Áo: Ам Вэ B» Bon 
А = Re he . | B=]... . pb 0237) 
Amı Am2 77: Амм Ву By2 ++- ByN 
we define the tensor product AQB as 
Ai Ар +++ Aim Ву Biz +> Bin 
An A22 --- Amm B» Bn --- Bow 
ASB-| . . ., . [e 
Amı Am2 c: Амм Ву Вуз ++ Bun 
АпВ АрВ --- AimB 
АВ AnB .-- AmB 
AmB AmB ··· АммВ 
AuBy AnBi +e AnBin АВ АВ) +++ AiBin +++ AimBu AimBi2 e Aim Bin 
Ait Bo, AnBo +++ АП Ву АВ ABa + АрВу +++ Atm B21 Aim Bx + Aim Bon 
AuBwi Ап Вм +++ Ап Вуу Ар Ву А2 Ву2 cc AizByn cc AimBni AimBy2 e AimBNN 
Az Вір Az Biz +++ Аз Ві AnBı А22 Во c5 AnBiy +++ АмВі AzmBi2 c5 Arm Bin 
Аз Вр А21 Вә2 +++ Аз ру А22 Вор .A22B22. +++ А2 Bon +++ Aou Bzj Aou Boo. + Азм Bon 
А21 Ву А21 By2 +++ Аз Вуү A22Bn1 А22 Ву2 +++ А2 Вум c Азм Ву AzmBy2 > Aou Byun 


AmıBıı AmiBi2 c Ami Bin Ам2Ви Am2Bi2 +++ Am2Bin +++ АммВи AmmBi2 --- АммВім 
Ami B21 Ami B22 ··· Ami Ву Ам2 В Ам2В2 +++ Ам: Ву ··· Aum B21 Amm B22 --- Amo Boy 


AmiBni AmiBy2 + Ami Byn Am2Bni Am2Bn2 ··· Am2Bnn e AmmBni AmmBy2 ·`· Амм Byn 


(10.238) 


Similarly, for vectors 
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aj bi 
a» bz 

a= . b=. (10.239) 
am by 


the tensor product a@b is defined as 


bı abı 
b2 аб 
ат А 
by aj by 
bi abı 
ai bi aib b» аЬ» 
a b2 ab aj . Р 
agb—-| . lel . |= | = : = > , (10.240) 
: : : by aby 
aM bn amb i 
by ambı 
by ay bz 
ам : 
bx ambn 


a'@b' = (ai, 025,777, ay)G(bi, bo, кы bn) = (а\ЬТ, ab", TS amb") 
= (aibi, abo, +++, aby, abi, a2b2, +++, by, +++, ambi, ambz, +++, ambn). 
(10.241) 


We introduce the following two-dimensional fundamental vectors: 
1 0 
in» (1). me (1). (10.242) 


|1) = (1,0), (2| = (0, D. (10.243) 


By using the fundamental vectors in two-dimensional space, we define the vertical 
and horizontal fundamental vectors in four-dimensional space by using the tensor 
product as follows: 
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11,1) = 10% = | > J, 11,2) = 1812 = 
(10.244) 


|2, 1) = |2)@|1) = ‚ 12,2) = |2)@|2) = 


Oorooocor 
—oocococcoc—-oco 


(1, 1| = (1J&(1| = (1, 0, 0, 0), (1, 2| = (Ц 2| = (0,1,0,0), 
(10.245) 
(2, 1| = QJ&(1| = (0, 0, 1, 0), (2, 2| = 218 2| = (0, 0, 0, 1), 


It is easy to confirm the following equality: 


‚ «(Au А Big Bra Vis. aso. 
T pe & By Bo li’, J) = Aji Bj р. (10.246) 
By extending the above example to general-dimensional fundamental vectors, the 
(i, jli’, j')-components of Ax B for any Мх М matrix А and NxN matrix В are 
expressed as 
(i, jJASB|i^, j) = (AI) GIBIJ) = А Bj y. (10.247) 
For M x M matrices A and C and N x N matrices В and D, we have 
(A@B)(C@D) = (AC)@(BD), (10.248) 
and 
Tr[A@B] = (Tr[A])(Tr[B]). (10.249) 
In deriving the equality in Eq. (10.248), the (i, ji’, j')-components of the MN x MN 
matrix (AG B)(C&D) are given by 


M N 
(i, )\(А®В)(С®р)\|г', j') = у (i, MAB, FE", j" Ce D)Ii', j’) 


i"=1j"=1 
(ie{1,2,---, M}, i'€(,2,-- M), je{l,2,---, №, j'e{l, 2,---, №). 
(10.250) 


For the M x M and NxN identity matrices IP and 1), it is valid that 
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(A@I™) (1 @B) = (I @B)(A@I™) = ASB. (10.251) 


Moreover, by using mathematical induction, we can confirm the following binomial 
expansion: 


n 


(Aer +198)" = Lrg (4er) (ries). (10.252) 
k-0 ^ ` 


By using Eq. (10.252), we can derive the following equality: 


+оо 
exp(A@I™) + 1 @B) = Y (аге +1 gg) 
n-0 ` 


+оо 4 n n! " = 
= 2 у (Ag1) (1° @B) 
+00 +00 | n! isum " 
Dal kia piae JU 88) 
D 1 (Ag1?) (1 @p)"* 
ron (С Е k)! 
+оо +оо | ; | 
Y uL (Aer) qan) 


k=0 [20 


k=0 1=0 
= (ехр(А)®1?)(1°®? @exp(B)) 
exp( A)Gexp(B). (10.253) 


V 1 k (N) (M v 1 
` ) ` 


By taking the logarithm of both sides of Eq. (10.253), we have 


In(exp(A))@1™ + 1 @In(exp(B)) = In(exp(A)@exp(B)). (10.254) 
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10.4.4 Quantum Probabilistic Graphical Models and 
Quantum Expectation-Maximization Algorithm 


This section explores a type of probabilistic graphical modeling based on Pauli spin 
matrices from the quantum statistical mechanical point of view. Our review focuses 
on the transverse Ising model in statistical mechanical informatics [37, 83, 84]. Note 
that generalization of the framework is possible. 

Consider a graph specified by nodes and edges (V, Е) where V is the set of all 
nodes i and E is the set of all edges (i, j}. We introduce Pauli spin matrices o and 
о“ as well as an identity matrix I defined by 


is ^ M iz v v». ier ы (10.255) 


The Pauli spin matrices at each node ie V={1, 2, - --, №) are defined by 


oQ1QI®-- -QI8I, 
1807 QI®-- -QI8I, 


а 


i = о*®1®1®..-®1®1, о = 0791919: .:ӘІӘІ, of 
з = 1®о0*®1®..-®1®1, о; = IGo?"9IG---GIGI, оў 


а 


оу = І9ІӘІГӘ: ::ӘГӘОХ, oi = IQIQIQ..-QIQ0?, of] = IQIQIQ. .:ӘГӘО. 
(10.256) 


The vertical and horizontal N-dimensional state vectors are defined by 


51,52, 775 SIV) = |51)@|52)@-- 'Gisiv) (s1€&2, s2€Q, ···, siv|e £2), (10.257) 


(51, К РТА" sivil = (51 |® (52 |®- . (sivil (s, EQ, SEQ, ку sivje £2), (10.258) 


where 


(+= 01,0), (—1| = (0,1), 1+0) = iu me (1) (10.259) 


М 1 ГА ГА ГА 
By using the state vector representations, (51, 52, · - +, Svi |S]; 59,77, 5|y,)-elements 
x r4 d t ы 
of o7, с j and оѓо j are given as 


(51,52, s ви lsi 55, sy) = || [ [ 9s | Gilo?ls) 
keV\{i} 


(SIEQ, SEQ,- Sv ER; SEQ, SEQ, +++, Siyj€Q), (10.260) 
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($1,552, 77, sv loz |si, 55,7 Siy) = П бу. Sh (silo* |5) 
keV\{i} 


SIEQ, SEQ, ++ +, Sv EQ; seq, 559, эе, SiyjEQ), (10.261) 


(531,52, Sivilegatlst. 52. s spy) = | П À. Je sj|\(0°@1)(1@0°)|s;, s) 


keVWi, jj 
(s1EQ, EQ, +, svjeQ; SpE, ER, ---, sjy ER). (10.262) 


The prior density matrix P (œ, у) and the data generative density matrix P (d|£) 
for a given data vector d are assumed to be 


exp(-5o у, (s; — sj) + 2273 


{i,j}EE ieV 


л Tr[exp(—a У? (s; — sj) + У) eu 
{i, JJEE ieV 
ү” : 
P(d|) = (02) Еэ 418" — ej) ) (10.264) 
ieV 


where о, В, and у are hyperparameters. The data generative density matrix P (d|) 
is expressed as a |Q|!!x|Q|!Y! diagonal matrix in which all the off-diagonal 
elements are zero. Each diagonal element (s1, 52, +-+, sjy;|P(d|B)|s51, 52, Siv) 
(C51, 82, = +, Sv) TEQ!) corresponds to the probability of the data vector d accord- 
ing to additive white Gaussian noise when the state vector (s1, 52, +++, Sjy|) is given, 
and В corresponds to the inverse of variance in the additive white Gaussian noise. 
By considering a quantum statistical mechanical extension of the Bayes formula, a 
posterior density matrix P (d, o, В, y) and a joint density matrix P(d|a, В, у) can 
be expressed as follows: 


exp(in( P(d16)) + m( P@, у))) 
Tr[ exp(In( Р(д\8)) +In( Poa, 2 


ex(- 5o D Gi — 0 ie D(A 0- ej) +5) 


P(d,a,B,y) = 


(i, fjeE ieV 
Tr[ exp(—5a >D Gi — 0 2 = D(a 10") _ of) +y} о, )] 
(i jjeE ieV 


(10.265) 
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P(d|o. B. y) = exp(in( Р(д\8)) + In(P (o, »)) 


ex(- e 25 (s; = aj) T sey ЛАШ c 29/9 


{i,jJEE ieV ieV 


l =) njolle Le -о°}) -vYt)] 


(10.266) 


The estimates of the states of the hyperparameters (a(d ), Bid ), vd )) are found 
) as follows: 


that maximize the marginal likelihood Tr| P (d 


(@(d), B(d), 9(4)) = arg max Tr| P (a )]. (10.267) 


To achieve the estimation criteria for hyperparameters a, В, and y in Eqs. (10.267), 
we extend the O-function in Eq. (10.8) to the following expression from a quantum 
statistical mechanical point of view: 


O(a, B. v|o'. Ву", d) = тра, о, в y')In(P(d )) ]ao265 


The quantum EM algorithm can be summarized as a procedure consisting of the fol- 
lowing E- and M-step which are repeated for t = 0, 1, 2, -- - until à and £ converge: 


E-step: Compute Q(a, p lor (t). pt), d) for various values of o and f. 
M-step: Determine (v (1+ 1), 8(t + 1)) so as to satisfy the extremum of condi- 
tions of 
О(о, В lo (1), Pt), d) with respect toa and 6. Updated «—o (t + 1) and Beit + 
1) 


The quantum EM algorithm can obtain the solution of the extremum condition in the 
marginal likelihood Tr| P (d 


|| because we have the following equalities: 


EG P. уа BP Dv = [sz In(Tr[P (lo. By) Nepean 
[p(o P ye BP 4) @.В,у)=@Ё,ў) = [pele | оа 
[5 2( Boy |? BP 4) (o B. y). B.) = [oa (TP | вонед 


(10.269) 


By substituting Eq. (10.265) into Eq. (10.268), the Q-function can be rewritten as 
follows: 


Q(a, B, y |o, 8',у',а) = -5a D Tr(o7@I — 180^) P;j(d, a’, By’) 
{i,jJek 
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1 / ГА ГА 
- Ули — 0°) Pj (d, o, В', y^] 
ty Tr [oj P; (d, о", B', y^)] 


ieV 


(8 


ниби 2, ( o QI — 190")? + 2373) 


JIeE ieV 


(10.270) 


The extremum conditions of O(a, B, yat). Bt), y (t), 4) with respect to a, 8 
and y, such that, 


s; (a. 
g 
тсе 
jy O(a 
can be reduced to the following simutaneous update rules in the quantum EM algo- 
rithm: 


a(t), A(t), y(t), d) = 0, 
a(t), B(t), y(t),d) =0, (10.271) 
lat), B(t), y(t), d) = 0 


@ 


By 
BY 
BY 


Y [оог - 190°) Piet + 1), v 1) 


(íjjeE 
= M meer - eo Pj. ot. BO. у @))], 
(üjjeE 
(10.272) 
1 


ваъ) У[(41 — 07) Pid, a(t), BO. у(0))], 010.273) 


ieV 


ymo Pi(a(t +1), y(-1)] = Уто“ Р; (а, a(t), B), y], (10.274) 


ieV ieV 
where 


(s;|Pi (d, æ, B. y)|s;) = (silTry Pd, о, B. у)|5) 


SEEL EE E Т 


TIENER — Tyj Er EEQ туей 


( П О 


jeVNi) 


(ER, s;eQ. ieV), (10.275) 
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(si, sj|Pij (d. æ, В, у)15;, s) 


(si, 8j| Pig (@, У)8;, sj) 


= (si, sj| Pj (d. a, B. у)151, s) 
(si, 9 тур j} P (d. о, В, y)Is;. 57) 


УУ EEE Etadi 


TEQTIEQ тує®т/єОтєО туе 


keV\{i,j} 
(s;j€Q SjEQ, SEQ, s EQ, ieV, jeV, i< j), (10.276) 


x | JI ta) (тр, T2,°++, Tyl P (d, œ, В, yj. A A tiy) 


= (si. 51Р (о, y)Is;, s) 


(si, sj Tr gi, j} P (0, у)|зу, sj) 


= рэ Dee D D ЭШ > Bs ris jorj gr тубу ут) 


тє0тоє0 yjerieoceo туе 


x| [I 5 Jeti mos IP уйт, mss цур) 
keV\{i, j} 


(sj €Q sjEQ, 5/69, s;eQ, ieV, jeV, i< j). (10.277) 


Finally, we explain how the state at each node is estimated from the reduced 
posterior density matrix P;(d, o, B, y) in Eq. (10.276) for each node i(€V). The 
reduced posterior density matrix P;(d, a, В, y) is a real symmetric matrix and can 


be diagonalized as 


Pi (d.a, В, y) -( 


VO ild, о, B, y) v P QM, о, B, y) 
yP C 1M, a, B, y) YP CM, а, В, у) 


ele yr, 9 ) 
0 Р; (д, a, В, у) 


а) 0) T 
x b a, В, у) о a, В, у) І (10.278) 
v; (lld, о, В, у) у; lld, a, В, у) 


where the eigenvalues, PO (d , a, B, y) and Р? (d,a, 8, y), are always real num- 


bers. The vectors ( 


vi? ld, о, В, D and (foe a, В, y) 
ү? (-1ld. a, В, y) 


correspond 
VO C1, a, В, i) i 


to the eigenvectors for the eigenvalues Р? (4 ‚ €, P, y) and р (d, о, B, у), such 


that 
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(n) (n) 
v; (+114, о, B. y) v; (+114, о, В, y) 
Р; (а, о, B,y)| 5 = Р;(п|4, о, В, у)| n 
| (у о И EH 


(ieV, ne(1,2)). (10.279) 


This means that the eigenvectors correspond to all possible states and probabilities of 


a) (2) 
the states ve a, B, y) and ue (+14, a, B, y) are Pd, a, B, y) 
V; (lld, «œ, В, y) V; (lld, œ, В, y) ' 


and Р? (d, a, B, у), respectively, in the reduced density 1 matrix Р; (а, а, В, y). The 


estimates for the state at each node i (€V), үш ae ae - a. B D ) are given by 
y; (+1 d,à, B. y) — А х : 
an d,@, В, 2) = argmax P; (4, a, B, y) (i€V). (10.280) 


These estimation criteria in Eqs. (10.267) and (10.280) correspond to quantum statis- 
tical mechanical extensions of the maximizations of marginal likelihood and posterior 
marginal. 


10.4.5 Quantum Expectation-Maximization (EM) Algorithm 
for Probabilistic Image Segmentation 


This section applies the framework of Sect. 10.4.5 to the EM algorithm for prob- 
abilistic image segmentations in Sect. 10.2.3. In our present framework, Hubbard 
Operators [85] are used instead of Pauli spin matrices. 

First, we introduce Hubbard operators X d at each node i (€V) as follows: 


x(^? = x grglo.--8IGl, 
Хх? = 1%Х®”®1®---®@1®1, 

(reQ, т'є9), (10.281) 
XGP = 1®1®1®---®1®Х(ү?, 


where 


10 00 01 00 
H+) = GH,-D = (14) = CL-D = 
sel жены жане Оа" 


(10.282) 


In probabilistic segmentation and clustering, o(D|s, a(+1), a(—1), C(+1), C(—1)) 
in Eq. (10.29) and P (s|o) in Eq. (10.30) correspond to the data generative and prior 
models, respectively. By using the Hubbard operators and extending Eq. (10.29) 
and Eq. (10.30) from the standpoint of quantum statistical mechanical informatics, 
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the density matrices of the data generative model and the prior model in quantum 
machine learning systems for probabilistic image processing can be expressed as 
follows: 


R(D\a(+1), a(—1), С(+1), C(-1) 


m = ( 1 ; -— ө. | 1) 
m П У? X; exp (di —a(s))C (5;)(4; — a(s;)) 
ic E det2z € (si)) 2 


exp (4x У (04: – ai) C7 Godd – GO) + In(det2xC(5;)))) p) 


iceVsieQ 


(10.283) 


e > qe") - P EDD Qu E XU CUM, 4. уу ath» E pen) 
{ЛЕЕ ieV 


R(a,y) = H 
mfes E У a? и) ир еи = XRD CLD) $y dixie? exem 
{i jJeE ieV 
(10.284) 
where 
ag (4-1) ag (—1) 
а(+1) = | acC-D |, a(-D = | ac!) |, (10.285) 
ag (4-1) ag(—1) 


Свв (+1) Cra(+1) Свв(+1) Свв (1) Cre(-1) CRB- 1) 
С(+1) = | Сов(+0 Сос(+1) Cost) |, C1) = | Car!) Caa!) Cog) |. 
Свв (+1) Свс(+1) Свв(+1) Свк(—1) Свс(—1) Свв(— 1) 
(10.286) 


The joint density matrix of s апа D is expressed in terms of the data generative and 
prior density matrix as follows: 


P(D\a, a(+1), a(-1), С(+1), С(—1)) 
= exp(In(P(Dla(+1), a(-1), С(+1), C(-1))) + In(P(o, y)). (10.287) 


By using the joint density matrix P (D, a, y, a(+1), a(—1), C(4-1), C(—1)), the 
posterior density matrix P(D,o, у, a(4-1), a(—1), C(+1), C(—1)) is defined by 
using Bayes formulas as follows: 


P(D\a, y, а(+1), a(—1), C(-1), C(-0)) 


P(D,a, у, а(+1), a(-1), CC-1), C( D) = Bie. y.at Da D, CGD, CCD)’ 


(10.288) 
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Estimates of the hyperparameters and parameter vector, @(D), 7(D), @(+1|D), 
a(—1|D), C(+1|D), Ce 1|D), are given by 


(а). ӯр), @+1|D),@(—1|D), C11), C-11D)) 
= Tr| P(D|a, y, a(+1), a(—1), С(+1), С(—1))|, 
TE аа D CC -D, CC D) ! Жаке ас en »] 
(10.289) 


The parameter vector S(D) = G (D), %(р), -- -, Sivı(D)) can be estimated 
from the reduced posterior marginal density matrix at each node i of 
P(D, a, a(+1), a(—1), C(+1), C(—1)) by similar arguments to those for Eqs. 
(10.278), (10.279), and (10.280). 

The Q-function for the EM algorithm in the present framework is defined by 


О(е/,у/,а(+1), a(-1), С), С(—1)е/, у", a (+1), a C71), C' (+1), С'(—1), D) 
= терр, а, y, a (+1), а'(—1), CD, C' C710) 


xIn(P (Dla, y, a(4-1), a(—1), C(+1), C( py}. (10.290) 


The EM algorithm is a procedure that performs the following E- and M-step repeat- 
edly for t = 0, 1,2,--- until @(D), @(+1, D), a(—1, D), CG, D), Ce 1, D) 
converge: 


E-step: Compute о(о, a(+1),a(-)), C1), C(-Dla@), а(+1, 0), a(—1, t), C CH t), С(—1,)) for vari- 
ous values of a(+1), a(—1), C(+1) and С(— 1). 

M-step: Determine a(t+1), a(+1,t+ 1), a(—1,t+ 1) C(41,t+1) and 
C(—1,t+ 1) 
so as to satisfy the extremum conditions of Q-function with respect toa (+1), 
a(—1), C(+1) and C(—1) as follows: 


(a(t + 1), ,a(+1,t + 1), a(—1,t + D, C(+1,t +1), C(—1,t + 1)) 


< extremum 
a,a(+1),a(—1),C(+1),C(-1l) 
O(a, a(+1), a(-D, С(+1), С(-1) |000), а(+1, t), a(-1, t), CÈ, t), C(-1, t), D). 
(10.291) 


Update a(D)<—a(t + 1), a(4-1, р)<а(+1, 1+1), (1, р)<а 
(—-1, t 4- 1), C, D)«—C(-1,t 4 1) and C(— 1, D) —C(-1,t + 1). 
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By using some equalities in Eqs. (10.283), (10.284), (10.287), and (10.288), the ЕМ 
algorithm using the Q-function can be reduced to the following expression: 


= x T|(181 = XL+) oy G+) - XLD QXGLAD) py (a(t+1,y(t+ 1] 
IE елек 
1 
е И _ y+) оуб) _ y+) o y(+,+) 
А X m[(1er X ex X ex ) 
(ЛЕЕ 
x Pij D, a(t), y(t), а(+1, t), a(-1, t), С(+1, t), С( 1,9), (10.292) 
1 
= (9 4 хур (а +1), y (t + | 
Viv 
1 
= 5 oe + aie 
112. 
ieV 
x Pi (D, a(t), y(t), (+1, t), a(-1, t), CCHL, t), С( 1.0). (10.293) 
Yat] XE” po. at). ya) ac 1,1), a(—1, t), C41, 0, C( 1.)] 
nE, t4 1) = {67 (£eQ), 
Yo mn[x 9 ру, a0), yO, 481,0. 471,0, C91. 0, C71, 0)] 
ieV 
(10.294) 
C(é;t +1) 
J (di — a)" (di аб; пут) ра, ait), y(t), a(-1, t), a(-1, 1), CŒ, t), €( 1.0) 
ieV 
EEN), 
Yom] x48) PD, oo. yO ac 1, £), a(—1, t), C41, 0, С(  ] 
ieV 
(10.295) 
where 


(s;| Pi(D, a, y, a(+1), a (71), CC D, CC D)Is;) 
= (s;|Tr\; P(D, о, y, а(+1), а(—1), CC-D, С(—1))|5;) 


SEOL XXE. 


тє0тє0  qyeQrneOreQ — queo 


( IT suae mitate 1), CC), CC- DJs dj. ty) 
JeV\{i} 


(s;i€Q, sj€Q. ieV), (10.296) 


(si, sj| Pj (D, о, у, а(+1), a(- D, C1), C(-1)))15;, 5) 
= (si, 5;| Pji(D, а, y, a(+1), a(-1), C(+1), С(—1)))|5у, 5) 
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= (si, sflTr Gj) P (D, а, B, y, a(+1), а(—1), С(+1), С(—1)))|5}, 54) 


-YYeYX LLY banter ГТ 


TENEN — Jy ET ENTER Tyje2 


( П seam 1), C441), CODIE, hs у) 


keV\{i,j} 
(SEQ 5j7EQ, s;eQ, s^eQ, ieV, jeV, i < j), (10.297) 


(si| Pi (œ, y)|s;) = (silTrua P(@, y)|s;) 


= Ye у, N Ns у, бету бугт} 


neQOneQ  mnyeQneQneQ TEQ 


x П дату (т, T2, °° tivi] P (o, У), 0. у) 
keV\{i} 


SEQ s;eQ, SEQ, sj Q, ieV), (10.298) 


(si, | Piz (=, у)[5;, 57) = (si, 5 Pji(@, у)[5;, 57) 
= (si s;[Tryi jP (œ, y)ls;. 57) 


= op Du Bp Se адд 


TEQVNEQ TVET EER tiy ER 


( П 2 (тр, т, s tvi PG у)|тү, T2, s Tiy) 


keV Mij} 


(s;€Q 5/60, з;єд, SEQ, ieV, ЈЄУ, i < j). (10.299) 


10.5 Quantum Statistical Mechanical Informatics 


This section explains some quantum graphical modeling using some quantum 
mechanical extensions of statistical mechanical informatics, such as quantum statisti- 
cal mechanical informatics, and particularly, advanced quantum mean-field methods. 
Fundamental frameworks and recent developments have been explored in some text- 
books in statistical mechanics [37, 86]. In some applications of quantum annealing 
to massive optimization problems, a transverse Ising model is an important quan- 
tum probabilistic graphical model [83, 84] and it is known that the density matrices, 
for example, in Eqs. (10.263), (10.265), and (10.266), in some familiar quantum 
statistical machine learning systems can be reduced to transverse Ising models. 
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In quantum statistical mechanical informatics, one of most important schemes is 
Suzuki-Trotter decompositions [87, 88]. This was used to realize the quantum Monte 
Carlo methods by mapping d-dimensional density matrices to corresponding (d + 
1)-dimensional probability distributions [89]. Recently, some quantum annealing 
schemes have been realized as actual quantum computers, for example, the d-wave 
machine. 

In the first part of this section, we explain some basic frameworks in advanced 
quantum mean-field methods for realizing familiar quantum statistical machine learn- 
ing systems for the transverse Ising models, including conventional frameworks of 
quantum belief propagations. In the second part, we propose a quantum adaptive 
Thouless-Anderson-Palmar (TAP) method and a new approach using the momen- 
tum space renormalization group method to realize coarse graining for the transverse 
Ising model not only for regular graphs but also for random graphs. In the third part, 
we introduce Suzuki-Trotter decompositions [87, 88], and show the basic scheme 
for mapping a d-dimensional transverse Ising model to a (d + 1)-dimensional Ising 
model and apply the scheme to the message passing rules of the conventional quan- 
tum belief propagation. 


10.5.1 Advanced Mean-Field Methods for the Transverse 
Tsing Model 


This section explores the detailed derivation of the deterministic equations in both the 
quantum mean-field method and the quantum loopy belief propagation method for 
the transverse Ising model [83, 84]. Note that the present framework of the quantum 
mean-field method and the quantum loopy belief propagation method are constructed 
in real space, while other familiar frameworks in quantum statistical mechanics such 


as spin wave theory are constructed in momentum space. 
For a graph (V, E) with a set of nodes V and set of edges E, we consider a density 
matrix P as 


(s (3 У (0-4) «2e aa”) уи) 
" 


{i j}EE ieV ieV 
Tr | exp] -> 1j bs (si =at) + shy (si - 018") гур 
p юз een j 2 2—4 i c! 
ij)e ic ie 
(10.300) 


Because 070; = I Q"). the density matrix in Eq. (10.300) can be reduced to Eqs. 
(10.226) and (10.227) with 


Н = —J У ojo? — hy dio? - T) oF. (10.301) 


(j]eE ieV ieV 
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Here, all the nodes j connected with the node i by an edge (i, j} are referred to as 
neighboring nodes of the node i, and the set of all neighboring nodes of the node 
i is denoted by the notation di. The quantum probabilistic graphical model in Eqs. 
(10.300) and (10.301) is referred to as the Transverse Ising Model [83, 84]. 

First, we explain the conventional quantum mean-field method for the transverse 
Ising model. We introduce a 2 x2" trial density matrix R and its 2x2 trial reduced 
density matrix R; for each node i (€V)) defined by 


(= ПЕ + 1) (= 1R;| 1) 
where 
(si|Rils;) = (s;|Try; Ё|з;) 


= х Е x Е уй x sans П Е 


TIENER TVERT ERER tiy ER jeVNi] 


(si€Q, sleQ. ieV). (10.303) 


By using Eq. (10.303), the average Tr(o;* R) can be expressed in terms of the reduced 
density matrix R; as follows: 


Tr(o? R) = у> M Ye > (51, 82,777 5р0 151. 52. s Sty) 


SIEQSER зуЄ05 60560 зуе 


х (51, 50,777, Siyi Ё|5у, 52, +, Sivi) 


=e | [|ы 


SIEQHEN зуЕ05 05560 — sj, EQ \kEV\Li} 


x (s;|o |s;) (51, 85. Дш: Siy |RIs1, 52, °°", Sivi) 
W ^ ^ =~ ^ ^ 
= 52 (slol) 5 У) У д 
5 EQS EQ TIENER HVET EJER? TER 
х| [T «4 [65 m s yl RIT, mss ди) 
keV\{i} 
= УУ (silo*|si) (5115) 
Si EQS EQ 
= Тг(о* R;). (10.304) 


By similar arguments to those for Eq. (10.304), we derive 


Tr(o? R) = Тг(о° К). (10.305) 
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Now, we assume that the trial density matrix R is expressed as 
R = Rı& R28- - Rivi- (10.306) 


In this case, the average Tr(ojo; R) and the entropy —kgTr К1пК can be expressed 
as 


тоўо К) = т(о[о(Е1®Е›®---®Вуу|)) 


posue те ajea 


sj€&s2€9  sjy|EQs{ EQS} EQ "LS /eQsyeQ se keVN(i) 


( П ОО ООУ, 
IeVNU) 


У У (silo 153) (51815) 


sie&steQ 


«| X: X лечил» Уу ра fird 


SjEQSTEQ keV\ {iS} sr Qs EQ 


= rior 408) П mnn) 


keV\{i, j} 


= (Tr(c?R;))(Tr(c?R;)). (10.307) 


— kgTr(RIn(R)) = —Tr((R19 R28 : Әу) (А19 R28 --GRivi)) 


N 
= —kp)Tr((Ri@R2- г Ri) (1 әкә?) 
i=l 


N 
= УУ У У УУ... YE (Ет) al also) x- x (iyi Rivilsty) 


i=l s EQS EQ sjvj est €Qs5eQ siye 


( П LT 


keV\{i} 


ll 


N 
a$ E Eoee) П (> кшн) 


i=1 NsieQsteo keV\(i} \зкєОз/ eo 


ll 


XL I] sav) 


i=l keV\{i} 
N 


—kg )Tr(RiIn(R;)). (10.308) 


j=l 


The free energy functional can be reduced to 
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FIR] = Furl R1, Rz, Rivi] = —J у, (Tr? Ri) (Tro? Rj) — h "Tr? Ri) 


(ЛЕ iev 
гу То" Ri) + ТУ "Tr(Riln(R;)). (10.309) 
ieV ieV 


We define the optimal reduced density matrix R; for each node i (€ V) by 


R; = arg extremum | Zur |&. Ri. эө, Ria. Ri, Ria, Ria, эө, Ry] [rR = 1] (ieV). 
(10.310) 


The simultaneous self-consistent equations for reduced density matrices are 
expressed as 


м 1 1 е х 
Ё = zo| or T2: enea о° + Го* | |, (10.311) 
1 a 
Z; = Tr| exp| — JY (Tro? R5) + hd; с“ +To* . (10.312) 
КТ jedi 


From Eq. (10.311), we can derive the following simultaneous self-consistent equa- 
tions for the magnetizations m; = Tr(o*R;) (ieV) and m} = Tr(o* R;) (iEV): 


J h [x 
"MI UA S LN, Еа 
а КЫТ › Ta^ (zz) 


(10.313) 
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Г 
т = ЕТ 
И 2 
In h r 
I < —— di —— 
т IS * (zz) 
jedi 
? 2 
J h r 
xtanh — \ m+ — di mE 
up S *(zz) 


jedi 


(10.314) 


The mean-field free energy Fmf |R, R, PE. R 1 of the present system is expressed 
as 


Fur| ĝi, 2s Rivi] = Y 7 sini) 


ieV 
pellen y] 
Ī— 
J ay я 
zs 2cosh (хе) +T А (10.315) 


Next, we extend the above framework for the mean-field method for the transverse 
Ising model to the quantum loopy belief propagation method based on the quantum 
cluster variation method in Ref. [90]. We introduce a 2" x2" trial density matrix R 
and its 2x2 trial reduced density matrix R; for each node i (€V)) defined by 


Ri; = Rj = Tri, JR 


+1, +I|R;;| ТЦ 1, 1, +1|R;jl 


( 1,+1) ( р ( ) (+1, +1 Ril = 0-10) 
| Gender) (ва пу) (ША 0 (+1 вр 1.0) 
SẸ Canon) (ЦА р) (ЦА 0) (ША) 
а, јат) (71, рр 1.) (—1,—1®ру|+1,—1) ET 
eV, jeV. i < j), (10.316) 


where 
(sj, 5j Rij sg. 85) = (sj, 8j|Rjils;. s) 
= (si, 8j|Tr\ (i,j) Rls}. 5) 


2.0 0X. 2 2300 Y) 


TEQTZEQ түєдтєОтєд туе 


ror / 
% 951,049 5,7) yr туду т П бут (T1; T2 +, vif. m.s щур) 
keV\{i, j} 


(5:69 s;€Q, 5/69, 5.0, ieV, jeV, i < j). (10.317) 
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By similar arguments to those for Eq. (10.304), we derive 
Tr(ofoiR) = Tr((o7@I)(1@0*)R;;). (10.318) 
We now assume that the free energy functional can be expressed as 


FIR] = Fretne[{RilieV}, (Ru, nlli HEE} ] 
=—/ Y T(N Ry) 


{i fJek 
—hY 'd/Tr(a? Ri) – T) То" Ri) 
ieV ieV 
+7 У Tr(R;In(R;)) 
ieV 


kg T у, (Tr(Ry,jIn(Ry,j)) — Tr(R;In(R;)) — Tr(Rjln(R;))) 


{i,jJEE 
=-J M Tr((o?@D (1807) (л) 
{i, ЛЕЕ 
—hY “d;Tr(o* R;) — rŠ Tr(o*R;) 
ieV ieV 
+7 M ТК (К) + ke T$ (0. — [9i Tr(R;ln(R,)). 
{i, jleE ieV 


(10.319) 
We define the reduced density matrix R; for each node i (€V) by 


Ё, = arg extrgmum [nene Rk [fi lieva}, {Rall Лее іт, = 1, Ry = Tr lg, jy (jeak))| 


(keV). (10.320) 


Run = arg extremum f Frene [t [а еу}, [Rin {i, JJEE\{K, n]] 
TrRg, = 1, Ё = Trey Run, Ё = Truco Ra] (dk, IL) € E). 


(10.321) 
To ensure the constraint conditions, we introduce the Lagrange multipliers as 
follows: 


LU{RilieV}, (8:10, j)eE]] = -7 у m(o*enaecRi;) 
.ЛєЕ 
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—hY "diTr(o Ri) — Гу То" Ri) 


ieV ieV 
+7 У Tr(Rijln(Rij)) + kw) (1 — Ji) Tr(Riln(R;)). 
(i, jJeE ieV 
-P GR — 1) — oq) (TR - 1) 
ieV {i JJEeE 
- У TAG (Ri — Tri Rij) - у) TA uj(R; — Tj Ris) 
{i,jJeE {i jJeE 


- У T(R ( — /(0*®1)(1®е®) + kgln(Rij) + iijQI + IGAj ij — Àij aen)) 


{i, jJEE 

DXICI һаа — Го“ + kaT (1 —|ail)In(R;) — Y Aj -м)) 
iev jedi 

Tuc Lug. (10.322) 
ТАЛ {i,jJEE 


Here we remark that му = А;, ji and À jij = Àj ji Ci, J] € E,i < j). 
We define the reduced density matrix R; for each node (ЄЎ) and Rj; = Rj; for 
each edge (i, j) € E by 


Ё = arg exiemum[&,( — һо — Го* — Уау + Та — [9i In(R;) — Ar) 
: jedi 


(eV), (10.323) 


Ё = arg extremum | Ri; ( — JODIDO?) + Ммыу@1 + 1Odj,ij 4 kpTin( Rig) — Ai 1@D) } 
ij 


(Лев). (10.324) 


The simultaneous self-consistent equations for reduced density matrices аге 
expressed as 


Ё eee : l hdjo* — Го* — УХ 
i = ехр{ —1+ — Jex 0“ — Го ы | |. 
р kaT ) т —1 jj 


jedi 


(10.325) 


so Atij) 1 z у), ый 
Rij = exp| —1 + exp (J(c*81)(1807) — мыу®1 — IGXj,ij) |. 
КвТ КвТ j i 
(10.326) 
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ài 1 1 
ехр[1— = Tr| exp - hd;o* — Го“ 205 kj А 
KBT kaT |Ә — 1 i 


Jedi 


(10.327) 


а; 1 
exp(1 = c = UE = Àj; 91 = 19444) . 
B 


BT 
(10.328) 
By introducing the linear transformations 
Мы) = Ài, ji = —hd;o* = Го* = у, А, (10.329) 
kedi\{j} 
Equations (10.325) and (10.326) can be rewritten as 
Roe 1 (ла :+Го* + УГА (10.330) 
i= nexp| — іб o —i | |, . 
z T ^k 
keði 
P 1 1 
Rij = Zan ew( > ( ортоо + h(d;(o7@1) + dj(1807)) + T(o*@l + 1®о*) 
+ Y were Y) те), (10.331) 


keaiNG) IeajNi) 


keði 


Zi = Tr| exp [ко pra y» (10.332) 
1 kg T 1 => , 


1 
20 = Tr exp( > (eende + h(d;(o* @1) + dj(1807)) + T(o* 8I + 1@o*) 


+ kort Y 1934... (10.333) 


keüiNU) IeàjNi) 
Then, by substituting Eq. (10.330) and Eq. (10.331) into 
К; = Try R (10.334) 


ij» 
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we derive the following simultaneous self-consistent equations for the effective 


fields: 
exp Ln TE hdjo* + To* 4 pee i 
кл] T 1 T T > 
kaT 7 cd кєзг\{/} 
Zi 1 
=; Try [ew 277 7 (4e Da8e*) + re (hio +Го*+ У) hj) 
{i,j} ledj\{i} 
1 
| ae А xX А 
1 ir do Кта Y wer). 
kei NU) 
(10.335) 
such that 
ver : hdc? + Го" D> Акы 
Jo 77 1 T T —L 
tut hat keai\ti) 
1 
+и(>=- -mr [eo 7 (7*&Da&e*) + 1@(hdio* prat Y dj) 
tij leaj\{i) 
1 
| „т^ 4 X | Я 
Ei E To* 4 Y uer) |), 
kei NU) 
(10.336) 


Note that Eqs. (10.335) and (10.336) can be regarded as conventional message 
passing rule quantum loopy belief propagation. The Bethe free energy 


а liev}, [Ё, |t, jee} | of the present system is given by 


Ява | RilieV |, [gli Nee} ] = Y - kaTIn(Zi)) 
ieV 
+ У) (-kaTIn(Zi,;) + kgTIn(Zi) -kgTIn(Zj). (10.337) 
(.ЛеЕ 


The conventional quantum message passing rules in Eqs. (10.335) and (10.336) 
reduce to Eqs. (10.95) and (10.96) for the case of Г = 0. 
Because we have the orthonormal relationships 


Tr[o*o?] = Tr[o*o*] = 2, 
Tr[o* I] = Tr[Jo*] = Tr[o* I] = Тох] = 0, (10.338) 
Tr[o*o*] = Tr[o*o*] = 0, 


Tr[(c*&1)(o*81)] = Tr[ (o* @1)(6* @1)] = Trl(1@o*)(1@o*)] = Тї (1®е*)(Т1®е*)] = 4, 
Tr[(o?@1)(o* @1)| = Tr[(o* @1) (a7 @1)| = Тг[(1®е*)(1®е®)] = Tr[(1@0*)(1@o*)] = 0, 
Tr[(o7@1)(18o*)] = Tr[(o* @1)(1@o*)| = Tr[(o?@1)(1@o*)| = Tr[(o* @1) 1@o*)| = 0. 


(10.339) 
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the reduced density matrices R; and R;; = Rj; expand to the following orthonormal 
expansions: 


1 : 
Ri = ym о“ + mia*), (10.340) 
Rij = К: 

1 

ы (o* @1) + mj (o7@1) + тў(1®о* ) + m (1807) 


ii (079 D$?) + cj ,(о*®1Г)(1®е*) 


*qi (6° @1) (07) + (еә) (19е). (10.341) 


where 


xc Se Tee es) 
ту = 10° Rj] = Tr[(1@o") Rij], (ti, j)eE, i < j, vetx, z}, v'e{x, z}). 
с = Тї (о”®Г)(1®о” )Ri;]. 

(10.342) 


By using these orthonormal expansions of the reduced density matrices, the Bethe 
free energy functional in Eq. (10.319) can be rewritten as 


FIR] = Рак [mi геу, vetx, a. [e |i Hee, ъ{х, z} vete. a] ] 


= — Ј у, Ci 7 һу &т}—Гу т 


{i,jJEE ieV ieV 
tT *( — Jai |) Tr(R;In(R;)) 
ieV 
+7 у, Tr(Ry,)ln(Ry,;)). (10.343) 
{i, jJEE 


The extremum conditions 


д ; 
а үт! iev, vete, a]. [et tis DEE, vix z} v'etx, a] | 
om; i 


= 0 (keV, ve{x, z}), 
(10.344) 


Pay i, ЛЄЕ, vix, z}, v'etx, 2 


ват іЄУ, ve{x, 3]. [az 


д 
дсп 
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= 0 ((&, Ie E, ve{x,z},, v'eix, z}), 
(10.345) 


can be reduced to the following simultaneous equations: 


h 1 


= $a - oiym[o*m(;)] + m[(s*e1)n(f;)] + 1 YM (1207) in( Ra) ]. 


ae ‚ией MI EY 
вт = ja = өто m(&;)] + е + а а (ern) 
(10.346) 
J dope. © 
UE [(*&1) 8o )i(R | 
0= ташы = 1 (о*®1)(1®о©)и(Ё/)]|, Чї DEE). 
0- Tr (s* e) 18e*)m(f;;)]. 
(10.347) 
where 
P3 1 ax X . 
i= 5 + о“ + mo ©) (i€V), (10.348) 
Rij = Ку 
1 AE A P 
E @ + йй (o* 8I) + (091) + m5 (1®е*) + її (190) 


+ (0° @1)(1@0*) + Ci, (6*@1)(180*) 


+E (0° @1)(1@o*) + GF, (o* эг)(їво*)) (t, ЛЄЕ, i < j). 
(10.349) 


For Г = 0, Eq. (10.347) with Eqs. (10.348) and (10.349) reduces to Eqs. (10.107) 
and (10.108) with Eqs. (10.72) and (10.102). 

Before finishing the present subsection, we briefly review another framework 
of the quantum advanced mean-field method. As we mentioned above, advanced 
quantum mean-field methods have also been formulated in the momentum space. One 
familiar formulation is spin wave theory [91]. A general formulation of the quantum 
cluster variation method from the viewpoint of spin wave theory was proposed in 
Refs. [92, 93]. 
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10.5.2 Real-Space Renormalization Group Method for the 
Transverse Ising Model 


We now present sublinear modeling in statistical machine learning procedures by 
using the real-space renormalization group method for the transverse Ising model in 
Eq. (10.301) on the ring graph (V, E) of Eq. (10.176) for the case of |V| = 2” and 
h = 0. The present scheme follows the one in Refs. [37, 94]. Some extensions of the 
present frameworks for the ring graph in Eq. (10.176) to higher-dimensional graphs 
such as the torus graph may be available according to the frameworks of Ref. [94]. 

The important part of the transverse Ising model in Eq. (10.301), 
—J (o*&I)(1&G0*) — Г(о* GI) can be diagonalized as 


-Jo -r o0 
—/(о°®1)(1®о*) – г(о*ә)= © у 0 
0 -r 0 —J 
(euh) 0 i :(1- улт) 
_ 0 fa zm) "HC кутт) : 
Him) — 9 0 alty) 
о lita) iru) о 
-vJ +7 0 0 0 
0 -JP+T2 0 0 
* 0 о VPs? 0 
0 0 0 Уљ+г? 
T 
Үт = Ж aes) 
от) н) _ 
a(l- ye) 0 0 (và) 
^ qiu) (т) 9 
(10.350) 


The eigenvalues £; = £&2 = —4/ J? + T?, єз = £4 = +y J? + T? have the relation- 
ship & = £2 < £3 = &4 and their corresponding eigenvectors are given by 
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1 EE Е 0 
i Em) E iQ - E E 
п) = ‚у= И 
(б-т) f+ e) 
i - — Г // 00350 
_ TER « - улт) 
i o Jr? Em 0 
TENE ү sae) 
+ 1(1- mu) 0 


To realize the coarse graining of the present transverse Ising model for the case of 


zero temperature Т = 0 for the density matrix P in Eq. (10.300), we introduce the 
following projection operator: 


P” = P@Pe---@P, 
< 


(10.352) 
2L P's 
where 
P= КШ +122) = |) 
1+ уут) 0 TEES 
0 £i ym) 0 E 
(10.353) 
Because it is valid that 
-J 0 -Fr 0 
BS 54 е7 =-/Л+г?( ©) top) )e- teri. (10.354) 
0-r 0 -J 


we can derive the following equalities 
о”) z 2n 
P; ( = Joy 405 — rož) P! 


Tensor Products of (i—1) Matrices (IQI) 
=P?( (т®1)®(1®1)®---®(1®1) ®(— /(о*®Г)(1®о°) - T(0*81)) 


T 
ә (rereuere-e(mer PP? 
: 


Tensor Products of (24—!—i) Matrices (IQI) 
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Tensor Products of (i —1) Matrices PUGDP) 


= ((üenP')e(PaenP")e---e(Paenr"))e(P(- J(o*@1) (180°) - r(e*er)p") 
e((Paenr')e(Paenr")e--e(Pusnr")) 
—————————— 
Tensor Products of (247! —i) Matrices (Р(Т®Г)РТ) 


Tensor Products of (i —1) Matrices PUGDP) 


- ((aenr)s(raenET)e- --@(PU@DP"))@(P(- J(o7@1) (1807) – r(e*a1))P") 
e((Paenr')e(Paenr")e--e(Pusnr")) 


Tensor Products of (2-—! —i) Matrices (Р(Т®Г)РТ) 


@—1) I's 
= (rere--er)e( - Л? + г°г)®(1®г®:.-®Г), (10.355) 
abs I's 


T 
peo — Гоо — 73 
Tensor Products of (i —1) Matrices (IQI) 

————— 
= BE (rer)e(rer)e---a(Ia1) 

@( — /(1®о°*®1®1)(1®1®е*®Г) — Г(1®о*®1®Г)) 

T 
ә _ (1®1)®(1®1)®---®(1®Г) ype 
Tensor Products of (2--! —i—1) Matrices (181) 


Tensor Products of (i—1) Matrices (P(2& DT) 
2 


= (Paens')e(Paen?")e---e(PaenP")) 


ө((вәғт)( - /(1®о°®1®1)(1®@1®о°®Г) — r(ree*erer)(rar")) 


e((PaenP")e(Paenr")e. : e(Paenr)) 


Tensor Products of (247! —i—1) Matrices (Р(Т®Г)РТ) 


Tensor Products of (i—1) Matrices (Р(Т®Г)РТ) 


^ 


= ((Paepr')e(raepr")e---e(Paenr")) 


e( — J((P(1@0?)P")@(PU@NP")) ((PUBDE")@(P(o7@1)P")) — r(zrec*) eraenr?)) 


e((PaenP")e(Paenr")e. : e(Paenr")) 


- 


Tensor Products of (2--! —i—1) Matrices (Р(1®Г)РТ) 
(i—1) I's 
nt 2 2 
= (1®1®---®1)®( — —— (0@1)(1@0*) - ———(o' 8I 1@1®---81), 10.356 
(rere ar)a( sense) mpge90)e(ere ә) ( ) 


QL-1-i-1) I's 


T 
Q^) Za x \р02) 
P; (= Jojo; — Го» JP; 
Tensor Products of (2^-! —2) Matrices (IQI) 


L ————— 
=P? ( - (әрә (181) @(181)®---@(1@1) euer) 
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Tensor Products of (2--! —2) Matrices (IQI) 
a— eo 
x(aene (1®1)®(1®1)®-.-®(1®Г) әй?) 
Tensor Products of (24~!—2) Matrices (IQI) 


r L 
-raene (181)@(1@1)®---@(1@1) a(Ieo*) rf _ 


5 (21—12) I's (21-12) I's 
J О — а —— 
= -p (e(o. . 81) @1) (19(T818: ; -@1) 0°) 
г2 (21-12) I's 
У 
S Fe (le (tele 1) ао"). (10.357) 


By using these equalities, the first step of the renormalized energy matrix Н = 


T 
POH pe) can be reduced as follows: 


HCD = pe ype" 
(01-1) T's 
E RD ак r*(rere- 91) 


= (i—1) I's 
2L 1 2 


-YY(rere-. а) е 25 (een) (160°) + a (e1) )o(tar0: 91) 
2- JPBXT Py ыш 


(2L-1—i—1) I's 


Я (21-12) I's (20—12) I's 
- EIS (s*e(Tere--ei)er)(re(rere--.1)ee*) 
В or) I's 
5 eere). (10.358) 


By similar arguments to those for the above procedure, the r-th step of the renor- 
; : = L- L-r4l L LT L-r+1)!_oL-r)T 
malized energy matrix н@ ЭЭ = (ee per). . pe a(z? ) per) per) 


can be reduced to the following recursion formulas: 


QL") 5 perry (2E-r+1) (Qh-rH1yT 
HO) =р0 "0н р! 


QL-77) 1% 
m oL oC) (rere i e) 
T 1 
21-1 @—1) I's 


-}(Т1®1®-. ei Je(7 (c*&1)(18a*) +r” (*er)Je(rere: 81) 
i=l ENDE. 
Q-7r-i- 's 


(24-2) 1% QL-7—2) I's 


-JO (0:e(1@1e--e1)er) (19(1@1@-.-@1) ао) 
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(24-2) I's 
Е ге(гә(тагег 1) әс"), (10.359) 
where 
(r—1)42 

Ј = O ) : 

(r-D32 [e-0)2 
VU a (10.360) 

Г) = 

JOD} (г—-0)2' 

eP = –ү(7—-0)2 4 (Г—-0)2, (10.361) 


H =H, 
JO = J, (10.362) 
ro =f, 


The inverse of the real-space renormalization group is given by 


Je» = [JO (JO $ ro), 
(10.363) 
re-D= [T (JO + pen, 


If the hyperparameters J“) and Г? in the r-th renormalized density matrix Н@ 
have been estimated from given data vectors by using the QEM algorithm for a 
renormalized density matrix on ring graphs (У), Е), we can estimate the hyper- 
parameters J = J апа Г© = T of the transverse Ising model (10.301) on the ring 
graph E of Eq. (10.176) for the case of |V| = 2% and h = 0 by using the inverse 


transformation rule of the real-space renormalization group procedure (10.363). 


10.5.3 Sublinear Modeling Using a Quantum Adaptive TAP 
Approach and Momentum Space Renormalization 
Group in the Transverse Ising Model 


This section proposes a novel scheme for the momentum space renormalization 
group approaches in Adaptive Thouless-Anderson-Palmar(TA P) Approaches for 
the transverse Ising model on random graphs. The adaptive TAP approach is a famil- 
iar advanced mean-field method for the probabilistic graphical model and many 
extensions have been proposed [95—98]. Furthermore, sublinear modeling for the 
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EM procedure in probabilistic graphical models has been realized by introducing 
Momentum Space Renormalization Group Approaches [99, 100]. The method 
proposed in this section is formulated by combining the adaptive TAP approaches 
with the momentum space renormalization group approaches. Moreover, our method 
is applicable not only to regular graphs but also to random graphs. 

The density matrix P in Eq. (10.300) can be rewritten as 


J A h IV? 1 "iy? 
_ z z z та} _ x pm 
E 2kgT n (si sj) 2kgT (si 1! ) 2kgT Ys ds 
ЛЄЕ ieV ieV 


J z A2 A , Wh\? 1 $ ivi\2) | 
Е? > (s; ei) zie а1@ | = xl = г? ) ) 
(10.364) 


The density matrix P satisfies the following minimization of the free energy func- 
tional: 


P = argmin 7[R] e£ = 1]. (10.365) 


F[R] = ZJ A Trl (e; - of) R] * уо = dit) к 


(i, jJJEE ieV 
І 
+5) (о — r12")? + каттоо ә). (10.366) 
ieV 


: 2 
Because all the off-diagonal elements of СА — oi) are zero, we have 


2 
= 5 Y Yo eset oj Issa) 


{i ,jIe Es, EQ52EQ siyje 
ж ($1, 52, Sivil [51, 52, 7 5 8р) 


1 
FALDO D бз узу (ор — 41%), 82, ++ зу) 


ieV 5,;EQ52EQ siyjeQ 


ж ($1, 52, Sii [51, 52, ++, 8р) 


1 Я 
+5) (оў — гї? ?/R| + kg T Tr[RIn(R)] 


ieV 


РУУ» $6 s; — sj) зт, эз, sivi Ё]з1, 52; 77 sivi) 


{i JE Es, EQ52EQ syyjeQ 


клу У Gi — (в, sy s svi R181, 82, +++ Sivi) 


ієҮѕє9%є9 syyjEQ 
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І е 
+5) (оў - r72") n] ke TTAR). (10.367) 


ieV 
By introducing the reduced density matrix R; in Eq. (10.275) and 


(51.52, sivi) = (51, 52, svi Ё|$1, 52, svi). (Gio 82, °° sven), 
(10.368) 


and by extending plsi, 52,775, sivi) to 
p$) = p(ó1. фә, Фу]) (o = (1, d». фи) Соо, coo)", (10.369) 
the free energy functional can be expressed as 
+оо (+оо 2 
M / ТИ “(п (Elk — 1) + 8(фк 4 DC фу) p(d)déidda- - -dovi 
ЛЕЕ 


+оо (+оо 
Aum / Z “(п (85x — D + 8(ф 4 D di p(@)ddiddr---ddyy) 
Ы keV 


ieV 


E» o? ГІ"), ;] + kg T Tr[RIn(R)]. (10.370) 


ieV 


Now we consider the following approximate free energy: 


+оо р+оо 
Frare ralo (Ri бїгє] = 54 у) ff fe )%е(ф)4ф\аф»---4дфуу\ 
{i,jJek 
+оо р+оо 
E f ТИС 4) оФ)афаф афи 
ieV 
zy Td (of — r19 ^m, 
У 1 


+оо (+оо +оо 
+Кв rf f d J P(P)In(p(b))ddidgy: - Афу 


+оо 
чыту (пак - f pi(i)In(0i(#)) dd). (10.371) 


iev 
where 
nens [7 [7 ТИСЕ p(91. 05. у) 41а афу 
GEV, ф;є(—оо, +оо)). (10.372) 


The reduced density matrix R; and the marginal probability density functions р; (ф;) 
апа p(¢) need to satisfy the consistencies 
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+оо +оо 
E eL. dip dide. -dvi = Ls dipi (Pidhi = Tro? R; GeV), 


he т Ф°о(@ф)аф\аф: гади = з Фф? pi ф)аф = 1, (10.373) 


and the normalizations 


+оо f+ +00 
| | vf p(p)db1dg2:--ddyy| = 1, 


pi (oj )do; = 1 (ieV). 
TrR; = 1. 


(10.374) 


Ri, pi (¢;) and роф) are determined so as to minimize the above approximate free energy 
|o. (Ri. pili€V}] under the constraint conditions in Eqs. (10.373) and (10.374). We 
Л 81 


. " Р 82 
introduce Lagrange multipliers f = | and g = . |, D, L, л, апал; to ensure 


fivi &|V| 
the constraint conditions in Eqs. (10.373) and (10.374) as follows: 


É Adaptive TAP[ 2. (Ri. ilie V]] 


= FAdaptive ТАР[о, (Ri. pili V]] 


oo oo oo +оо 
-Y'u ( [i [o [oso | m) 
icv —oo оо —oo —oo 
ayy (/ [С]; шюФ®деав tv - Tra?) 
ieV =09 99 =99 
“(у Ni e eto ation: аду ] 
ieV 99 99 


-L E $i° pi Фдаф = ) 


ieV 


AT LS Forma 


ЕА (JZ ei eode, —1). (10.375) 


ieV 


By taking the first variation of the approximate free energy £ Adaptive TAP [9. (Ri, pilicV }] 
with respect to the marginals, we can derive the approximate expressions of R;, 5; (Qi), 
and 2(ф) as follows: 


(eV), (10.376) 
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Рф) 
(st - O3 OU *a)é - Pie -ay- Pie -) 
7 Га IPEA Yu tae z zie La 27 re M | 
(10.377) 
1 1 1 
ew ;( 51%” + 8i9j 5h (oi 4y)) 
АФ) = че | | (10.378) 
2 ads | | А 
f ome (= tt sihi- зне —4)?) Ja 
where С is the |V|x|V| matrix in which the (i, j)-elements are defined by 
|di| (i = j), 
Су =) —1 Ci, ЛЄР), (10.379) 
О (otherwise), 


for any nodes i(eV) and j(ev). Equations (10.376), (10.377), and (10.378) can be 
rewritten as 


_ r 
Ё = 1 fi +f fF T n fit fi T2 
2cosh( xd Vf? + m) 2y fi? +T? TEE [72472 1 
vV f2 +r? 1 И 
" exp(+ 27 fe +P ) 0 4 Туут? 
1 2 г : 
0 шл УД Umm ! 


(10.380) 


Las _ [det((h + DIY! + JC) 
ep) | Оту 


ew( aU (h+ DV 1)" Qf e gh) (®+ DI + JC) 


x (o —((h+ DI + 7C) (f +g + һа). (10.381) 


_ h+L 1 gi - hdi V? . 
бф) = ү ew ar’ Dh- ж; )) че», (10.382) 


The Lagrange multipliers f, g, L, апа Р are often referred to as the effective fields and 
are determined so as to satisfy the consistencies in Eq. (10.373), which reduce to the 
following simultaneous equations: 
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g hd =(h+L)((D- 11 „усу у, (10.383) 


fi 1l /¢2 2 
DIS tanh (hy Л Г ) 

Р 1 /g2 2 
Je tanh ( TBT h Г ) 


f+g+hd = ((р Lr lV) 4 Ic)" fr 


fivi 1 -2 p2 


(10.384) 


1 


L=-h+-4 i | Tail tg hd) (f +g 4 hd), (10.385) 


І 1 
bei dU +++ +4) VI 


Tr(((h + D)IIVI 4 Jey’). 


(10.386) 
The real symmetric matrix C is diagonalized as 
С = ОАО-!, (10.387) 
à; 0 0 0 
0220 0 


А=| 0 0з. 0 | (10.388) 
0 0 0 +: Ау 
where 4, > 52 > A3 > -+ > Ау. All the eigenvalues 44,25, - - +, ày] are always real num- 


О; 


Ud; 
bers. For the eigenvector u; = . corresponding to the eigenvalue à; such that 


Оу 
Au; = A;uj, for every ie{1, 2, 3,---, M), the matrix U is defined by 
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Uj Ui {з ©- Uy, 
U2, Uo) Uz ++: Uzv] 


U = (иу, иә, из, uy) = | Озі U32 U33 ©: Озу |, (10.389) 


Ои Оруро буз +++ Оруу 


It is known that U is а unitary matrix that satisfies U~! = UT for the real symmetric 
matrix C. By using the diagonal matrix A and unitary matrix U, the density matrix R 


in Eq. (10.390) can be represented as follows: 


1 T (21У1) (211) 1 T 
——— hI А)®1 — — 
exp( T? (( vu )® Je ът Е 


"= 1 Ivi T7 1 (10.390) 
г —— cT ( (h1 ? QVby, l ¿T 
nC a e ee „г 
where 
б с di 
z 
d 
= Я = (07910) EN MD + ллу 2 | feel”, 
Av| oy dig 
(10.391) 
& ot т@!У!У) 
& ox т@!У!У) 
=|. [=(uter™)) 1 a |. (10.392) 
&vi oly] КА 


By using the Gram-Schmidt orthonormalization in the framework of Fig. 10.17, 
we introduce a new unitary matrix 


О 02 Олз ses бӯ] 
Фр Ux» Оз +++ Uy 


Uat, |=(й„й›.йз, |). (10.393) 


Q 
ll 
= 
E 
ne) 
© 
„ө 
чө 


Uu UP Uva Uii 


where 
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Graph 3 -1 0 -10 -1 à 0 0 0 0 0 

Laplacian -1 2 -1 ооо 02; 0 0 0 0 

Matrix =| 9 -1 3 -1 0 -1 |109 0; 0 0 0, 
sh cL ед =1 00 020 0 
000 -12 -1 00 0 0 A0 
-1 0 -1 -1-1 4 00 0 0 0 A 


Unitary Matrix of L 
Ui; Ui Шз Ui Uss Uis Sad de 
Uz, Uz2 Уз Uz4 Urs Ung E ( n Un a) 


Us; Us? Uss Us, Uss Use 
Us: Usz Uss Usa Uss Use 


" Dı 0, Oy A, 0 0 
0=|0,, U, Un | =>» L-U|0 A ЭШ 


з Us Us New Graph , As 
where 10 0 Laplacian 
vor - mo - (o 1 o) Matrix 
0 0 1 


Fig. 10.17 Momentum space renormalization group for graphical models on random graphs 


Ui Ui? Ui 
U21 U22 ӯ 
у = . ‚22 = : eS vp = . : (10.394) 
UF Сӯр АДИ 
и! 
“u =з, a = —_., 
u^ Tu 
1 '1 
‚т , 
и, =v- "1%? и! й = "2 
2 u^ Ти" 1 T | 
1 1 и! ul, 
2 2 
j Тоз, шз, Ва. 010.395 
из v3 at Ty uy úl т uy, ug — T, ( . ) 
1 1 u^ ul, 
3 3 
Ty (Ty. u v~ и! 
we on и OW Й m. Й 
IVI IVI “Тш 1 ш! T uf 1 =, Tus DAN IVI u! Tu 
БУ 202 101-1 101-1 TABLA 


By using the new unitary matrix 0 and a diagonal matrix A 
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м 0 0 0 
0л 0 0 
A=] 0 0з. 0 | (10.396) 
ооо А 


we introduce a renormalized density matrix # from the standpoint of ће momentum 
space renormalization group for general graphs as 


exo ( xr (quo D 4 Jior” ba xe 
Ps 87 5; т (10.397) 
Eu Q^!) Ql" 1) 
EE а (о! + ЈА) QI КЁ) 
where 
ü oF di 
[$ " of Е i " 
t= А = (ter) ^ - (n1 V + ЈА) 0 2 era b. 
" А 2 
D M а ӯ 
(10.398) 
[7 оў 19111) 
_ | & sa] оў те!Ўўр 
ESI е (utar) oN cag | | (10.399) 
5g oF HEAT, 
д, би б» бз Us Uy, Un Шз +--+ Шуп а 
d» Un Un Оз 5 Uz7 Шо Un Uz ··· Шур d» 
Я = Оз U32 U3 > Usi, Ui Un U33 ©. Uivis d3 
i Uia Ui ©ўз Yr Шү Ug, Ua, °° Uv \aivi 


(10.400) 


For this density matrix P in Eq. (10.397), we can formulate the approximate reduced 


density matrix R; and the approximate Gaussian marginal probability density func- 
tion #(ф) = (фі, фә. oj) and 2(ф;) for the corresponding quantum adaptive TAP 


approximation as follows: 
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2 1 = 


à 1 fit RT? E WES, 
i r 
1 {72,72 | f24r => Тә э 
2oosh тїт Г +r ) УЛ +T dn f 4r2 
r 
1 [2,p 1 d 
exp| 4 т 7412) 0 72 
X P| T EBT fi А m +r 


1 72 = Ss 
«© (aP) m 


(10.401) 


" det((h + D) Vb + JK) 
P(b1. 02. og = 
on IVI (QrkgT)!¥| 


хо э? (# C + Б)гї + A) waa) 
(0 + BAW) + 5) 07 


хб 
«(8- eo + Dy + A) ep ge 2 


(10.402) 


x A+L 1 nl, Si hdi 
aes T roc b(» c ji Je (10.403) 


The reduced density matrix Qi and the marginal probability density functions 4; (¢;) 
and AG Des Фй) need to satisfy the consistencies 
+оо 


+оо р +оо +оо v P ^ m 
f [| «f Qip (o. фә, “++, Фӯ) 414%. аф = ф:0,(фг)4ф: = Tra? К; (ieV), 
99 А09 од O +оо тр 

El J «f P(o 447) tton dog У) x 9/9, (ODd: = 1. 

ieV = Ed ieV 


(10.404) 


The Lagrange multipliers f, $, Č, and D are determined so as to satisfy the consis- 
tencies in Eq. (10.404), which reduce to the following simultaneous equations: 


жы —1 
ъи (+ (qp - Zr" +) any (10.405) 
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І 
<В 
Е h tah ( i f2 г) 


м fpes ~ Lad AN aw = 
fegenasU((D-Tu o. X) OT) Jm 


Ўт 
IVI 
(10.406) 
Е 1 1 lí 2 ATxx x 
l= nate [b+ a rre (£g end), (10.407) 


= = (6 Dro 7x) | 
14 Fk (f gena) (£g nd) m 


(10.408) 


10.5.4 Suzuki-Trotter Decomposition in the Transverse Ising 
Model 


In this section, we review the Suzuki-Trotter formulas and extensions from con- 
ventional quantum loopy belief propagation using them. In quantum probabilistic 
graphical models, the state space is defined by all the eigenvectors of the density 
matrix R and the probability of each eigenvector is given by the eigenvalue as men- 
tioned in Sect. 10.4.2. To compute some statistical quantities by using the Monte 
Carlo method, it is necessary to diagonalize the energy matrix H, which is a massive 
computation. Instead of such a scheme, quantum Monte Carlo methods based on 
the Suzuki-Trotter formulas were proposed [89]. One important part of the quan- 
tum Monte Carlo method is the mapping from a quantum probabilistic graphical 
model to a conventional (classical) probabilistic graphical model by introducing the 
techniques of Suzuki-Trotter decompositions. It is known that some statistical quan- 
tities for conventional (classical) probabilistic graphical models can be computed by 
MCMC methods. This is a basic idea behind quantum Monte Carlo methods. Let 
us first review the Suzuki-Trotter formulas and explicitly give a detailed scheme of 
Suzuki-Trotter decompositions for the transverse Ising model in Eqs. (10.226) and 


(10.227) with Eq. (10.301). 
From the definition of the exponential function for square matrices, we have 
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exp(x(A + B)) = I + x(A + B)4 ja + B + O(x) (х0), 


ЖКУ ЛУ 
ехр(хА) = I -- x Ad ae А + O(x^) (х 4-0), 


exp(x B) 2 I - xB 4 os + OQ) (x +0). 
From these equalities, the following formula can be confirmed: 


exp(x(A + B)) = exp(xA)exp(x B) + оо?) (x +0). 


Moreover, we have 


ехр(хА) = [exe(~4)]" | e(s.) (x? « M), 


x 
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(10.409) 


(10.410) 


(10.411) 


(10.412) 


(10.413) 


exp(x(A + B)) = [оо( 2л) оо( 2 в)]" + (7) (х2 « M). (10.414) 


Generally, for a graph (V, E) with the set of nodes У = {1,2,---, N} and the set of edges 


E = {{i, j)], we have 


M 
2 
Z у; sua) -| П ool at) +о(®) (х2 « M), (10.415) 
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These are referred to as a Suzuki-Trotter Decomposition [87, 88]. 
For the case of N = 2, we consider an energy matrix H defined by 


H = —Jofoj — ор — hoof – Гоү — Гоу. 


(10.416) 


(10.417) 
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It is referred to as a quantum transverse Ising model on a chain (V = {1,2}, E = {{1, 2}}) 
with three nodes and two edges. By using the above Suzuki-Trotter formula, we have 


1 
(51,1. 82, ew (- rrr) 51.1. $2,2) 


M 
| 1 1 
= „in [orl p Vua taa зоў))е(т-туг (ге И ref) | 


= d. > э P» oD З Qo» T2 2 55 assi 1982, aris 


991, 16972 1 EQS] 269052 2Є0т] 2є07 5€ s1, y es) MEQ 


1 
x П СЕСЕ (Јо 04 t hoy t hag) Jimi a тз) 


m=1 


x (rim; raymlexp( (Cox + гаф) зна). (10.418) 


ЕТМ 
Note that 
loy +Toxy = l'o* 8I + TIGo*. (10.419) 


By using Eq. (10.253), Eq. (10.418) can be rewritten as 


1 
(81,1, slew(- н) 151191) 


^ ^ ^ ^ 
о $ x90» РТ 
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ms m k TM PI. , т * 


(10.420) 
Moreover, by the definition of the tensor product for 497 and I&A for any matrix A 
in terms of Eq. (10.238), we have 


1 1 
(Slym soml(ex0( 7.5707 )er) (гео го") Jims 52,т+1) 


1 1 
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Equation (10.420) can be rewritten in terms of the two-dimensional representation 
as 


kgT 
^ ^ ^ ^ 
= lim tee 8 8 
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Eventually, the density matrix P of the transverse Ising model for two nodes in Eq. 
(10.417), such that 


(аги) 
[oir] 


can be reduced to the probability distribution PD (s1, 52, s3, - SM, sy 44) for sm = 


P= (10.423) 


Ga (т = 1,2,-++,M + 1) on the 2x(M + 1) ladder graph as follows: 
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where 
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1:52,1 


Г п) 


1 
xew( kgT M (K(MT, D)s1,m51,m4-1 + K(MT, Ганн). 
(10.426) 
cosh( rT 
K(T,T) = kgTIn com apr) (10.427) 
sin (x1 r) 


The density matrix P in Eq. (10.423) of the transverse Ising model for |У| nodes 
V = (1,2, :::, |у}, which is given by Eqs. (10.226) and (10.227) with Eq. (10.301), 
can be reduced to the matrix representation for s probability distribution 


Sim 
52.m 
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(10.430) 


The dynamics of quantum Monte Carlo methods based on Suzuki-Trotter decom- 
positions have been analyzed by using Glauber dynamics [101, 102] and Langevin 
dynamics [103, 104]. Recently, these analyses are applied to some statistical machine 
learning systems with quantum annealing [105, 106]. Some statistical analysis of 
quantum Monte Carlo methods for statistical inferences based on Suzuki-Trotter 
decompositions [87, 88] are shown in Chaps. 12 and 13 of Part III of this book. 

We now try to construct a modification of the conventional quantum message 
passing rule in Eq. (10.335) for the transverse Ising model in Eqs. (10.226) and 
(10.227) with Eq. (10.301) by imposing the assumption that all off-diagonal elements 
ofa j; апал; j for any edge (i, j (c E) are zero. By using the Suzuki-Trotter formulas 
in Eqs. (10.415)-(10.416), Eq. (10.335) can be represented as follows: 
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The sufficient conditions for Eq. (10.432) аге given by 
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By taking the summations $^ --- Y^ and the limit M> + оо on both sides of Eq. 
5269 Si MEQ 
(10.433), modified message passing rules can be derived as follows: 


1 1 
ew( plicit Ыт 7ie =) 
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We remark that the modified message passing rules of Eq. (10.434) can be derived by 
considering the Bethe free energy functional in the cluster variation method with a 
ladder-type basic cluster for the probabilistic graphical model [107] in Eqs. (10.429)- 
(10.430). While the conventional framework of quantum belief propagations was 
given as a quantum cluster variation method in Ref. [90], some extensions of loopy 
belief propagations have been proposed in Refs. [108—111] from a quantum statistical 
mechanical standpoint. 
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10.6 Concluding Remarks 


This chapter explored sublinear modeling based on statistical mechanical informat- 
ics for statistical machine learning. In statistical machine learning, we need to com- 
pute some statistical quantities in massive probabilistic graphical models. Statistical 
mechanical informatics can provide us with many statistical approximate computa- 
tional techniques. One is the advanced mean-field framework, which includes mean- 
field methods and loopy belief propagation methods such as the Bethe approximation. 
The advanced mean-field framework can provide good accuracy for statistical quan- 
tities, including averages and covariances. Some statistical quantities in probabilistic 
graphical models sometimes have phase transitions when computing the advanced 
mean-field method. As we have already shown in Sect. 10.3.3, we have two familiar 
phase transitions, namely, the first- and second-order phase transitions. Each step 
of the EM algorithm is often affected by the first-order phase transition because the 
internal energy in the prior probabilistic model has a discontinuity. This difficulty 
appears in the convergence procedure of the EM algorithm, in which the trajectory of 
a hyperparameter passes through not only the equilibrium state but also metastable 
and unstable states in the loopy belief propagation of probabilistic segmentations in 
Sect. 10.3.5. We show that some algorithms based on loopy belief propagation in 
probabilistic segmentations can be accelerated by the inverse real-space renormal- 
ization group techniques in Sect. 10.3.6. 

The second part of this chapter explored quantum statistical machine learning 
and some statistical approximate algorithms in quantum statistical mechanical infor- 
matics for realizing the framework. Quantum mechanical computations for machine 
learning are rapidly developing in terms of both academic research and industrial 
implementation. In Sect. 10.4, we explained the modeling framework of density 
matrices and some fundamental mathematics for it and expanded the modeling frame- 
work to the quantum expectation-maximization algorithm. In Sect. 10.5, we showed 
the fundamental frameworks of quantum loopy belief propagation and quantum sta- 
tistical mechanical extensions of the adaptive TAP method. Moreover, we reviewed 
the Suzuki-Trotter expansion, and the real and the momentum space renormalization 
group for sublinear modeling of density matrices. 

Recently, we have the framework of massive fundamental mathematical model- 
ing in the statistical machine learning theory for many practical applications, such 
that, mainly the sparse modeling [4, 5] and the deep learning [10]. Many academic 
researchers are interested in interpretations of such modelings in the stand point of 
probabilistic graphical models in the statistical mathematics [2, 3, 7] and the statisti- 
cal mechanical informatics [8, 9, 13, 17]. Now we have novel technologies for real- 
izing quantum computing in the stand point of quantum mechanical extensions of the 
statistical mechanical informatics, such that, for example, D-wave Quantum Annealer. 
Some results in which the D-Wave quantum annealers have achieved high perfor- 
mance computing have appeared in Refs. [112—116]. Some recent developments 
of the probabilistic graphical modelings and their static and dynamical analysis of 
the advanced mean-field methods and the Suzuki-Trotter decompositions as well as 
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the replica methods for realizing sublinear modeling are shown in the subsequent 
Chaps. 12 and 13 of the present part of this book, in the statistical mechanical point 
of view. 
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Chapter 11 A) 
Empirical Bayes Method for Boltzmann get 
Machines 


Muneki Yasuda 


Abstract The framework of the empirical Bayes method allows the estimation of the 
values of the hyperparameters in the Boltzmann machine by maximizing a specific 
likelihood function referred to as the empirical Bayes likelihood function. However, 
the maximization is computationally difficult because the empirical Bayes likelihood 
function involves intractable integrations of the partition function. The method pre- 
sented in this chapter avoids this computational problem by using the replica method 
and the Plefka expansion, which is quite simple and fast because it does not require 
any iterative procedures and gives reasonable estimates under certain conditions. 


11.1 Introduction 


Boltzmann machine learning (BML) [1] has been actively studied in the fields of 
machine learning and statistical mechanics. In statistical mechanics, the problem of 
BML is sometimes referred to as the inverse Ising problem because a Boltzmann 
machine is the same as an Ising model, and it can be treated as an inverse problem 
for the Ising model. The framework of the usual BML is as follows. Given a set 
of observed data points, the appropriate values of the Boltzmann machine parame- 
ters, namely the biases and couplings, are estimated through maximum likelihood 
(ML) estimation. Because BML involves intractable multiple summations (1.е., eval- 
uation of the partition function), several approximations have been proposed for it 
from the viewpoint of statistical mechanics [2]. Examples include methods based on 
mean-field approximations (e.g., the Plefka expansion [3] and the cluster variation 
method [4]) [5-11] and methods based on other approximations [12-14]. 

This chapter focuses on another type of learning problem for the Boltzmann 
machine. Consider the prior distributions of the Boltzmann machine parameters and 
assume that the prior distributions are governed by some hyperparameters. The intro- 
duction of the prior distributions is strongly connected to regularized ML estimation, 
in which the hyperparameters can be regarded as regularization coefficients. The reg- 
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Fig. 11.1 Illustration of scheme of the empirical Bayes method considered in this chapter 


ularized ML estimation is important for preventing overfitting to the dataset. As men- 
tioned above, the usual BML aims to optimize the values of the Boltzmann machine 
parameters using a set of observed data points. However, the aim of the problem 
presented in this chapter is the estimation of the appropriate values of the hyperpa- 
rameters from the dataset without estimating the specific values of the Boltzmann 
machine parameters. From the Bayesian viewpoint, this can be potentially accom- 
plished by the empirical Bayes method (also known as type-II ML estimation or 
evidence approximation) [15, 16]. The schemes of the usual BML and the problem 
investigated in this chapter are illustrated in Fig. 11.1. 

Recently, an effective algorithm was proposed for the empirical Bayes method 
for the Boltzmann machine [17]. Using this method, the hyperparameter estimates 
can be obtained without costly operations. This chapter aims to explain this effective 
method. 

The rest of this chapter is organized as follows. The formulations of the Boltzmann 
machine and its usual and regularized ML estimations are presented in Sect. 11.2. 
The empirical Bayes method for the Boltzmann machine is presented in Sect. 11.3. 
Section 11.4 describes a statistical mechanical analysis for the empirical Bayes 
method and an inference algorithm obtained from the analysis. Experimental results 
for the presented algorithm are presented in Sect. 11.5. The summary and some dis- 
cussions are presented in Sect. 11.6. The appendices for this chapter are given in 
Sect. 11.7. 


11.2 Boltzmann Machine with Prior Distributions 


Consider a fully connected Boltzmann machine with n (bipole) variables S :— (S; € 
{-1, +1} |i = L2, ...,n) [1]: 


1 n 
exp (h YO Si+ YO Jusis), (11.1) 
Zh, J) ( 3 27585) 


i«j 


P(S|h, J) := 
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where 55. j 18 the sum over all distinct pairs of variables, that is, У 
24 У 41: 20, J) is the partition function defined by 


i<j — 
ZU J) := У exp (h у, S; + > 2818), 
$ i=l i<j 
where X s is the sum over all possible configurations of S, that is, 


L-I. 


i=1 §;=41 


The parameters h € (—oo, +оо) and J := {Лу € (—00, +оо) | i < jj denote the 
bias and couplings, respectively. 

Given N observed data points, D := (S? є (—1, --1 | u = 1,2, ..., N}, the 
log-likelihood function is defined as 


1 N 
Імі. (A, J) :— E. y» P(S™ | n, J). (11.2) 
ш=1 


The maximization of the log-likelihood function with respect to h and J (1.е., the 
ML estimation) corresponds to BML (or the inverse Ising problem), that is, 


{мі Ум} = arg max Lyr (h, J). (11.3) 
h,J 


However, the exact ML estimations cannot be obtained because the gradients of the 
log-likelihood function include intractable sums over O (2") terms. 

We now introduce the prior distributions of the parameters h and J as Porior(h | H) 
and 


Pie (Y= [| Poss Y (11.4) 


i<j 


where H and y are the hyperparameters of these prior distributions. One of the 
most important motivations for introducing the prior distributions is the Bayesian 
interpretation of the regularized ML estimation [16]. Given the observed dataset D, 
using the prior distributions, the posterior distribution of h and J is expressed as 


P(O | һ, J) Porior (A | H) Porior J | y) 


Ро (2, J | D, H, y) = P(D|H,y) , 


(11.5) 


where 
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N 
P(D |h, J) := | | PS” Ih, D). 


и=1 


The denominator of Eq. (11.5) is sometimes referred to as evidence. Using the pos- 
terior distribution, the maximum a posteriori (MAP) estimation of the parameters is 
obtained as . . 

{ЛмАР› J МАР} = кш Luapth, J), (11.6) 


where 


1 
Luap(h, J) := — In Pos (h, J | D, H, y) 
nN 


1 1 
= Lym (h, J) + —Ro(h) + — Ri (J) + constant. (11.7) 
nN nN 


The MAP estimation of Eq. (11.6) corresponds to the regularized ML estimation, 
in which Ro(h) := In Pgig (A | Н) and Ri (J) := In Poior(J | y) work as penalty 
terms. For example, (i) when the prior distribution of J is a Gaussian prior, 


n nJ? 
Psy |) m Jare- 0), v0 (11.8) 


Ri (J) corresponds to the Lz regularization term and y corresponds to its coefficient; 
(ii) when the prior distribution of J is a Laplace prior, 


n 2n 
Ња |) = [у exe (7 ү 57100), v > 0 (11.9) 


R,(J) corresponds to the Lı regularization term and y again corresponds to its 
coefficient. The variances of these prior distributions are identical, that is, Var[J;;] = 
y/n. 

The following uses the Gaussian prior for J and the following as a simple test 
case: 


Porior(h | H) = ê(h — Н), (11.10) 


where ó(x) is the Dirac delta function; that is, in this test case, Л does not distribute. 
It is noteworthy that the resultant algorithm obtained based on the Gaussian prior 
can be applied to the case of the Laplace prior without modification [17]. 
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11.3 Empirical Bayes Method 


Using the empirical Bayes method, the values of the hyperparameters, H and y, 
can be inferred from the observed dataset, D. For the empirical Bayes method, a 
marginal log-likelihood function is defined as 


1 
Lgg(H, y) := у РФ Lh. Di, (11.11) 


where [- - - ]a, у is the average over the prior distributions, that is, 


[7] = far f au onis | H) Potior (J | у). 


This marginal log-likelihood function is referred to as the empirical Bayes likelihood 
function in this section. From the perspective of the empirical Bayes method, the 
optimal values of the hyperparameters, Й and ӯ, are obtained by maximizing the 
empirical Bayes likelihood function, that is, 


A 


{H, ў} = arg max Lgg(H, у). (11.12) 
Hy 


It is noteworthy that [P(D | h, J)];,; in Eq. (11.11) is identified as the evidence 
appearing in Eq. (11.5). 
The marginal log-likelihood function can be rewritten as 


1 
Les(H, y) = — In | exp (nN Ly. (п, 2), » (11.13) 


Consider the case N >> n. In this case, by using the saddle point evaluation, 
Eq. (11.13) is reduced to 


1 ^ 1 А 
Lgp (H, y) ez nN In Prior (tur. | Н) T nN In Potior (J ML | y) t constant. 


In this case, the empirical Bayes estimates (H, y} thus converge to the ML estimates 
of the hyperparameters in the prior distributions in which the ML estimates of the 
parameters {А ML» J Mt} (i.e., the solution for BML) are inserted. This indicates that 
parameter estimations can Бе conducted independently of hyperparameter estima- 
tion. This trivial case is not considered in this section. Remember that the objective 
is to estimate the hyperparameter values without estimating the specific values of the 
parameters. 
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11.4 Statistical Mechanical Analysis of Empirical Bayes 
Likelihood 


The empirical Bayes likelihood function in Eq. (11.11) involves intractable multi- 
ple integrations. This section presents an evaluation of the empirical Bayes likeli- 
hood function using statistical mechanical analysis. The outline of the evaluation is 
as follows. First, the intractable multiple integrations in Eq. (11.11) are evaluated 
using the replica method [18, 19]. This evaluation leads to a quantity with a certain 
intractable multiple summation. The quantity is approximately evaluated using the 
Plefka expansion [3]. Thus, from the two approximations, the replica method and 
Plefka expansion, the evaluation result for the empirical Bayes likelihood function 
is obtained. 


11.4.1 Replica Method 


The empirical Bayes likelihood function in Eq. (11.11) can be represented as 


1 
Lgg(H, y) = — In lim ©, (A, y), (11.14) 
AN  x-1 
where 
JU OH y) := [za. Jy^ exp N (h Уа + У), T (11.15) 
i=l [©] М 
апа 


ig jx 
os У | (и) rre b. } (и) (Hh) 
d; = N = 5; , dij = N = 5; S; 


are the sample averages of the observed data points. We now assume that ty := xN 
is a natural number larger than zero. Accordingly, Eq. (11.15) can be expressed as 


(Н, у) =| X exp | » ( » si + Nd) 
Sy i=l gsl 


+E (у SSP мау) |]. (11.16) 


i<j a=1 


a} 


where a, b € {1,2,..., ту} are the replica indices and s is the ith variable in the 


ath replica. S, := (S |i =1,2,...,n; a = 1,2,..., t] is the set of all vari- 
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Fig. 11.2 Illustration of the original system 
replicated system. The ту 

replicas, S 

StU gi} SIS, arise 


from Z(h, J) in 
Eq. (11.15) y 


replicated system 


ables in the replicated system (see Fig. 11.2) and У; s, 15 the sum over all possible 
configurations of S,, that is, 


Sy i=l а=1 іе 
ME 


= 


We evaluate W,(H, y) under the assumption that т, is a natural number, and then 

we take the limit of x — —1 from the evaluation result as an analytic continuation, ! 

to obtain the empirical Bayes likelihood function (this is the so-called replica trick). 
By employing the Gaussian prior in Eq. (11.8), Eq. (11.16) becomes 


wos (H, y) = exp [nN HM + Е (се +=) - aro], алт) 
where 
iy 4, C; кеў, (11.18) 
n n(n — 1) = 
and 
FK, y) := -In "exp(— Ex (Sy; Н, y)) (11.19) 
Sy 


is the replicated (Helmholtz) free energy [20—23], where 


! The justification for this analytic continuation may not be guaranteed mathematically. Thus, this 
type of analysis is regarded as “trick.” 
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E;(Sx; H, y) := — H > » se ry; У si) sta 


i=1 a=1 i<j 
Se sss? (11.20) 
i<j a«b 


is the Hamiltonian (or energy function) of the replicated system, where 5^, 2р is the 


sum over all distinct pairs of replicas, that is, У), р = 9 4 Ука жү? 


11.4.2 Plefka Expansion 


Because the replicated free energy in Eq. (11.19) includes intractable multiple sum- 
mations, an approximation is required to proceed with the current evaluation. In this 
section, the replicated free energy in Eq. (11.19) is approximated using the Plefka 
expansion [3]. In brief, the Plefka expansion is a perturbative expansion in Gibbs 
free energy that is a dual form of a corresponding Helmholtz free energy. 

The Gibbs free energy is obtained as 


G,(m, Н, y) = —nt, Hm + extr [ancy — In 9 exp ( — Ey (Sx; А, y))}. 
(11.21) 


The derivation of this Gibbs free energy is described in Sect. 11.7.1. The summation 
in Eq. (11.21) can be performed when y = 0, which gives 


G,(m, H,0) = —nt,Hm + nt, extr {Am — In(2 cosh А)} 


= —nti,Hm + пт;е(т), (11.22) 
where e(m) is the negative mean-field entropy defined by 


l+m l+m 1=-т. 1-т 
= 1 1 s 11.23 
em) y ww ty ө ES) 


In the context of the Plefka expansion, the Gibbs free energy G, (m, H, у) is approxi- 
mated by the perturbation from С. (т, H, 0). Expanding G (m, H, y) around y = 0 
gives 


С.т, H, y) _ 


= xHm + xe(m) +o (m)y + Ф (m)y? + О(у?), (11.24) 
n 


where ф (m) and ф (т) are the expansion coefficients defined by 
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1 9G, (m, H, 
ф® (m) = ju esu 
| nNk! y>0 ду" 
The forms of the two coefficients are presented іп Eqs. (11.34) and (11.35) in 
Sect. 11.7.2. 


From Eqs. (11.14), (11.17), (11.24), and (11.33), the approximation of the empir- 
ical Bayes likelihood function is obtained as 


Lep(H, у) © НМ — extr LZ — e(m) + &(m)y + amy? |. (11.25) 


where 


(n — DN 1 
Ф(т) := 6") (m) ET (c; x) 


The forms of 6") (т) and 6°) (m) are presented in Eqs. (11.37) and (11.38) in Sect. 
ШКЕ, 


11.43 Algorithm for Hyperparameter Estimation 


As mentioned in Sect. 11.3, the empirical Bayes inference is achieved by maximizing 
Lep(H, y) with respect to H and y (cf. Eq. (11.12)). The extremum condition in 
Eq. (11.25) with respect to H leads to 


т = M, (11.26) 


where т is the value of m that satisfies the extremum condition in Eq. (11.25). By 
combining the extremum condition of Eq. (11.25) with respect to m with Eq. (11.26), 


aiM), М IHM) > 


H = atanhM ( 3M 3M 


) (11.27) 


is obtained, where atanhx is the inverse function of tanh x. From Eqs. (11.25) and 
(11.26), the optimal value of y is obtained by 


Ў = arg max | - Ф(М)у — ф°)(М)у?]. (11.28) 
kal 


Since Eq. (11.28) represents a univariate quadratic optimization, y is immediately 
obtained as follows: (1) when фо) (M) > 0 and ®(M) > О or when фо) (М) = 0 
and Ф(М) > 0, ў = 0, (ii) when $°)(M) > 0 and Ф(М) <0, ў = —®(M)/ 
(26°) (M)), and (iii) Ў —> oo, elsewhere. The case of ¢°)(M) = Ф(М) = 0 is 
ignored because it may be rarely observed in realistic settings. Using Eqs. (11.27) 
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and (11.28), the solution to the empirical Bayes inference can be obtained with- 
out any iterative process. The pseudocode of the presented procedure is shown in 
Algorithm 1. The order of the computational complexity of the presented method is 
O(Nr?). Remember that the order of the computational complexity of the exact ML 
estimation is O (2"). 


Algorithm 1 Proposed Inference Algorithm 


1: Input Observed dataset: D := {500 є {—1, +1)" | w=1,2,..., №). 
2: Compute M, Q2, C1, and C» using the dataset according to Eqs. (11. 18) and (11.36). 
3: Determine у using Eq. (11.28): 


0 case (i) 
ў = |-Ф(М)/02ф©)(М)) case Gi) 
со elsewhere, 


where case (i): фм) > 0, Ф(М) > 0or e?) (M) = 0, ®(M) > О апа case (ii): eO M) - 
0, Ф(М) < 0. 

4: Using ў, determine Й using Eq. (11.27). 

5: Output ў and Ê. 


In the presented method, the value of H does not affect the determination of y. 
Several mean-field-based methods for BML (e.g., listed in Sect. 11.1) have similar 
procedures, in which J ML is determined separately from hu. This is a common 
property of the mean-field-based methods for BML, including the current empirical 
Bayes problem. 

Although the presented method is derived based on the Gauss prior presented 
in Eq. (11.8), the same procedure can be applied to the case of the Laplace prior 
presented in Eq. (11.9) [17]. 


11.5 Demonstration 


This section discusses the results of numerical experiments. In these experiments, 
the observed dataset D was generated by the generative Boltzmann machine (gBM), 
which has the same form as Eq. (11.1), via Gibbs sampling (with a simulated- 
annealing-like strategy). The parameters of gBM were drawn from the prior dis- 
tributions in Eqs. (11.4) and (11.10). This implies that the model-matched case (1.е., 
the generative and learning models are identical) was considered. In the following, 
the notation œ :— М/п and J :— ,/y are used. The standard deviations of the Gaus- 
sian prior in Eq. (11.8) and the Laplace prior in Eq. (11.9) can thus be represented 
as J/./n. The hyperparameters of gBM are denoted by Hyue and Ле. 
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11.5.1 Gaussian Prior Case 


We now consider the case in which the prior distribution of J is the Gaussian prior 
in Eq. (11.8). In this case, the Boltzmann machine corresponds to the Sherrington- 
Kirkpatrick (SK) model [24], and thus exhibits a spin-glass transition at J = 1 when 
h = 0 (i.e., when Н = 0). 

We consider the case Hue = 0. The scatter plots for the estimation of J for 
various Jue When Aime = О and a = 0.4 are shown in Fig. 11.3. When Ле < 1, 
our estimates of J are significantly consistent with Jue. This implies that the validity 
of our perturbative approximation is lost in the spin-glass phase, as is often the case 
with several mean-field approximations. Figure 11.4 shows the scatter plots for 
various œ. A smaller œ causes J to be overestimated and a larger o causes it to 
be underestimated. In our experiments, at least, the optimal value of о seems to be 
Оор: © 0.4 when Aire = 0. Our method can also estimate Й. The results for the 
estimation of H when Неме = 0 and a = 0.4 are shown in Fig. 11.5. Figure 11.5a, 
b shows the average of | Hire — H | (i.e., the mean absolute error (MAE)) and the 
standard deviation of H over 300 experiments, respectively. The MAE and standard 
deviation increase in the region where Ле > 1. 


inf J 


Jirue Jirue 
(a) (b) 


Fig. 11.3 Scatter plots of Jirue (horizontal axis) versus Ј (vertical axis) when H4, = Oanda = 0.4: 
ап = 300 and bn = 500. These plots represent the average values over 300 experiments 
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11.5.2 Laplace Prior Case 


We now consider the case in which the prior distribution of J is the Laplace prior in 
Eq. (11.9). The scatter plots for the estimation of J for various values of Ле when 


Fue = 0 are shown in Fig. 11.6. The plots shown in Fig. 11.6 almost completely 
overlap with those in Fig. 11.4. 


infJ 
infJ 


Jirue 


(a) 


Fig. 11.4 Scatter plots of Jue (horizontal axis) versus J (vertical axis) for various œ = N /n when 
Aue = 0:an = 300 and bn = 500. These plots represent the average values over 300 experiments 
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Fig. 11.5 Results of estimation of Й versus Jirue When Hirue = О and a = 0.4: a the MAE and b 
standard deviation. These plots represent the average values over 300 experiments 
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inf J 
inf J 


0.5 


0.0 0.5 1.0 
Jirue 


(a) 


Fig. 11.6 Scatter plots of Jirue (horizontal axis) versus J (vertical axis) for various a = N/n, when 
Aue = 0, in the case of the Laplace prior: a n = 300 and b n = 500. These plots represent the 
average values over 300 experiments 


11.6 Summary and Discussion 


This chapter describes the hyperparameter inference algorithm proposed in [17]. As 
evident from the numerical experiments, the proposed inference method in both the 
Gaussian and Laplace prior cases works efficiently except for the spin-glass phase. 
However, the presented method has the drawback that it is sensitive to the value of 
a = N/n. In the experiments in Sect. 11.5, although o 7 0.4 was appropriate when 
Неме = О, it is known that the appropriate value decreases as Н increases [17]. 
Since we cannot know the value of Не in advance, the appropriate setting of o is 
also unknown. Estimation of оор is an open problem. It seems to be unnatural that 
there exists an optimal value of o because larger datasets are better in usual machine 
learning. Such peculiar behavior can be attributed to the truncating approximation in 
the Plefka expansion. A more detailed discussion of this issue is presented in [17]. 

Finally, we review the presented method from the perspective of sublinear com- 
putation without considering the aforementioned issues. The Boltzmann machine 
given in Eq. (11.1) has p parameters, where p = O (n°). In usual machine learn- 
ing, N = O(p) is, at least, required to obtain a good ML estimate for the Boltz- 
mann machine. Therefore, a hyperparameter inference “without” the empirical Bayes 
method (namely, the strategy in which the hyperparameters are inferred through 
the ML estimate in a similar manner as that discussed in the latter part of Sect. 
11.3) requires a dataset of size O(p). However, the presented method requires only 
N = O(n) = O(J/p) because а = O(1) with respect to n. 
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11.7 Appendices 


11.7.1 Appendix 1: Gibbs Free Energy 


In this appendix, we derive the Gibbs free energy for the replicated (Helmholtz) free 
energy in Eq. (11.19). 

The replicated free energy is obtained by minimizing the variational free energy, 
defined by 


FIO] = У ES: Н, у)0(5,) + У O(Sx) In (Sx), (11.29) 


S. Sx 


under the normalization constraint, that is, У, S. Q(S,) = 1, where Q(S,) is a test 
distribution over Sx, and E,(S,; H, y) is the Hamiltonian for the replicated system 
defined in Eq. (11.20). 

The Gibbs free energy is obtained by adding new constraints to the minimization 
of РО]. We add the relationship 


ДЕ z УУ Уу SOS.) (11.30) 


i=l а=1 8, 


as the constraint. Using Lagrange multipliers, ће Gibbs free energy is obtained as 


бут, H, y) = exe |7101 (570650 - 1) 
S. 


п ту 


-A(Z D Ys!" oc пт), (11.31) 


i=l а=1 8, 


where "extr" denotes the extremum with respect to the assigned parameters, and r 
and A are the Lagrange multipliers for the normalization constraint of Q(S,) and 
the constraint in Eq. (11.30), respectively. Performing the extremum operation with 
respect to Q(S) and r in Eq. (11.31) gives 


G,(m, H, y) = extr [nm -In X `ехр(— ES: H +À, »)]. (11.32) 
Sy 
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The replicated free energy in Eq. (11.19) coincides with the extremum of this Gibbs 
free energy with respect to m, that is, 


FX(H, y) = extr G,(m, Н, y). (11.33) 


By performing the shift Н + A — A in Eq. (11.32), Eq. (11.21) is obtained. 


11.7.2 Appendix 2: Coefficients of Plefka Expansion 


This appendix presents the coefficients of the Plefka expansion in Eq. (11.24). Refer 
to Ref. [17] for a detailed derivation. The first-order coefficient is given by 


х(п=- 0МС з) (n—I1)Ky , 
т т 


Di — 
фу (т) 2п 2nN 


; (11.34) 


where K, := T(t, — 1)/2. The second-order coefficient is given by 


(n—1?uNQ , (n= DANC _ эу 


pP (т) = эз (IL — т?) a 
(n — D) К.С 2 242 (n — DK, 4 242 
2 m*(1 — m^) AUN (n + т, 3)m (1 — m^) 
(n — DK, 
HL (1— т®)?, (11.35) 


where © in the first term of Eq. (11.35) is defined as 
Уо уа (11.36) 
Е ES i? Ж, " ij , > 


where (i) := {1,2,...,п}\ {i}. When x = —1, these coefficients are 


(n-DNCi 5. - DON д 


ean) = a. (11.37) 
eO = айн т?у+ € ee a = т2)2 

(n - u EOG as. uid 

(n – Hees DR ee Dore D a s^. 


(11.38) 
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Chapter 12 A) 
Dynamical Analysis of Quantum ENS 
Annealing 


Anthony C. C. Coolen, Theodore Nikoletopoulos, Shunta Arai, 
and Kazuyuki Tanaka 


Abstract Quantum annealing aims to provide a faster method than classical com- 
puting for finding the minima of complicated functions, and it has created increasing 
interest in the relaxation dynamics of quantum spin systems. Moreover, problems in 
quantum annealing caused by first-order phase transitions can be reduced via appro- 
priate temporal adjustment of control parameters, and in order to do this optimally, 
it is helpful to predict the evolution of the system at the level of macroscopic observ- 
ables. Solving the dynamics of quantum ensembles is nontrivial, requiring modeling 
of both the quantum spin system and its interaction with the environment with which it 
exchanges energy. An alternative approach to the dynamics of quantum spin systems 
was proposed about a decade ago. It involves creating stochastic proxy dynamics 
via the Suzuki-Trotter mapping of the quantum ensemble to a classical one (the 
quantum Monte Carlo method), and deriving from this new dynamics closed macro- 
scopic equations for macroscopic observables using the dynamical replica method. 
In this chapter, we give an introduction to this approach, focusing on the ideas and 
assumptions behind the derivations, and on its potential and limitations. 


12.1 Quantum Ensembles and Their Dynamics 


We imagine an ensemble of K independent quantum systems |у), labeled by 
а = 1... К, ай with the same Hamiltonian but distinct initial conditions. Making a 
measurement of an observable A in this ensemble means randomly picking one of 
the K systems, with equal probabilities, and measuring A in the selected system. The 
average of the observable A can then be written as (A) = Tr(o A), where p, the den- 
sity matrix, is the Hermitian nonnegative definite operator o = K^! Pu v^) (y, 
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with Tr(p) = 1. Since р is Hermitian it has a complete basis of eigenstates (|k)]. 
Its eigenvalues wg, which are nonnegative and normalized according to Xa уйк = 1, 
can be interpreted as probabilities. We can now write (A) = $^, а, У, wy|(Kk|n) |7. 
Hence the probability of measuring eigenvalue a, of observable A in the ensemble is 
P, = Y Wel (kin) |2, where | (k|n) ? is the probability of observing a, in eigenstate К 
of the density matrix, and wx is the probability of finding the ensemble in eigenstate 
К. 

The evolution of the density matrix follows from the evolution of the states |), 
each governed by the Schródinger equation, giving i p = Gh)^![H, p]. The solution 


is о = еті! о, 0 e4"/h In particular, it follows using the eigenbasis {|E)} of H 
that 
(H) = Y Ele I py HIE) = (Н), о. (12.1) 
E 


At equilibrium [H, о] = 0. The density matrix can therefore be diagonalized simul- 
taneously with H, that is, o = У. f(E)|E)(E|. The values of f(E) define the 
type of equilibrium ensemble at hand. In the canonical ensemble we have f (E) = 


ехр(-ВЕ)/2 (В), so 
1 1 
= —_ -PELEN E] = е". 12.2 
р Zp IENEI = „ув (12.2) 


The quantum partition function Z (£) follows from Tr(p) = 1: 2(8) = Tr(e-?P). 
The free energy and the average internal energy are given by F = —f^!log Z(f) 
and € = — Е log Z (£). The expectation values of operators become (А) = Z(f) ! 
Тг(е ё A). Note that if the systems of the ensemble evolve strictly according to the 
Schródinger equation, there cannot be generic evolution of p toward the equilibrium 
form in Eq. (12.2). For any initial density operator with (Н), о Æ € this is ruled 
out by Eq. (12.1). Since the state in Eq. (12.2) describes the result of equilibration 
of quantum systems in a heat bath with which they can exchange energy, a correct 
description of the dynamics requires a Hamiltonian that also describes the degrees 
of freedom of the heat bath. 

This is the first obstacle in the analysis of the dynamics of quantum ensembles: it is 
difficult even to write down the correct microscopic dynamical laws. A similar situa- 
tion occurs also in the classical setting. Without a heat bath we have a micro-canonical 
ensemble with conserved energy. Deriving the Gibbs-Boltzmann distribution from 
the joint dynamics of the system and heat bath requires us to connect determinis- 
tic trajectories to invariant measures via ergodic theory and to subsequently derive 
the form of these measures, which has so far proven possible for only a handful of 
models. 

The approach followed in [1] was to circumvent ensembles altogether and solve 
the Schródinger equation for small systems in which a decaying longitudinal field acts 
as quantum noise (which is indeed what happens in quantum annealing). In classical 
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systems one often defines the pain away. One constructs an intuitively reasonable 
stochastic process that evolves toward the Gibbs-Boltzmann state, usually of the 
Markov Chain Monte Carlo (MCMC) form. This process is studied as a proxy for 
the dynamics of the original system. The price paid is that one cannot be sure to what 
extent the stochastic dynamics are close to those of the original system. The MCMC 
equations are not even unique, since there are many choices that evolve to the Gibbs- 
Boltzmann state. The same dynamics strategy can be applied to quantum systems 
if the latter can be mapped to classical ones. This is achieved by the Suzuki-Trotter 
formalism [4]. 


12.2 Quantum Monte Carlo Dynamics 


In order to apply quantum annealing to optimization problems formulated in terms 
of binary variables, one needs spin-3 particles [1]. These are labeled by i = 1... N, 
with Pauli matrices (o7, o7 , of}. In the standard representation of o*-eigenstates: 


dac Rc 


In quantum annealing one chooses Hamiltonians of the form H = Но + Hj, in which 
Hp is obtained by replacing the classical spins o; = +1 in an Ising Hamiltonian by 
the matrices o7 and a second part H; that acts as a form of quantum noise! : 


Ho = — у, Jijojo; —h 9, Н = -T 2,9. (12.3) 


i<j 


Ho represents the quantity to be minimized in our optimization problem. The classical 
state achieving this minimum follows from the quantum ground state of the system 
upon moving the parameters Г and f^! adiabatically slowly to zero and is hence 
obtained from the partition function Z (8) = Tr(e-PHo-PHt). For excellent reviews 
of the physics and the applications of the above types of quantum spin systems with 
transverse fields, we refer to [2, 3]. 

The Suzuki-Trotter procedure [4] allows us to convert the above quantum problem 
into a classical one using the operator identity 


M 
e^*? — Jim oi adi | (12.4) 


М->со 


From now on we assume that A апа В are Hermitian operators, and we write the 
basis of eigenstates of A as {|n)}. We then obtain after some simple manipulations: 


' For simplicity, we choose Ho here to be quadratic in the spins, and the external field to be uniform, 
but this is not essential. 
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+ M a 
Tr(e^*^) = lim у; eL [DT (ne? Ings). — 25) 


nj... M k, mod(M) 


Application to A = —8 Ho and B = – ВН, where the relevant basis is that of the 
joint eigenstates of all {о}, that is, [5], ..., sv) = |81) Q ... Q |sy), with s; = +1 
and o*|s1,...,5v) = sils1,..., SN), gives Z(B) = limy—oo Zm (B), where 


N 
M x 
Zyu() = у, е'8/М) Mii Diu; Jijsies ith 27; si] П ] [tse 1Si,k+1) 
[six] k, mod(M) i21 
= e2 NM logl 5 sinh(28T/M)] 


x у, e(&/ M) Ds Jij Sik Sje У sik H- B У осм) Doi SikSik+1 f (12.6) 


si.) 


in which B — -i log tanh(8T / M). Thus the partition function of the N-spin quan- 
tum system is mapped (apart from a constant) onto the limit M — oo of that of a 
classical Ising model with NM spins s = (5j), with Hamiltonian H (s) and asymp- 
totic free energy density f = limy. œ Шу оо fw: 


ud hoe 
Н(ѕ) = зу 222. зиз — x 222.9 (12.7) 
k=l 


k=l i<j i 


-5 у, у, SikSi,k+1> 


k,mod(M) i 


M 1 
fv. = -35 log E товгум) | 
_ dy log у, аи УИ У Fei sin8 jah Yo; Sik +B У, moian Xi Siksik . (12.8) 


{sx=1} 


The new system in Eq. (12.8), for M —> oo equivalent to the original quantum one, 
lends itself to constructing a stochastic dynamics. We first write the Suzuki-Trotter 
Hamiltonian in the standard form of N M interacting Ising spins in an external field: 


1 
H (s) = E b» Sik Jik, jeje — Ө >з, (12.9) 
ik, je ik 
B 
—ó 
p 


The conventional Glauber dynamics by which this classical system evolves toward 
the equilibrium state with the above Hamiltonian is, after switching to continuous 
time [5] and denoting by p, (s) the probability of finding the system in state s at time 
t: 


1 
Jik je = maun) + bij (ôk e+1 +ôe +1), 0 = А/М. (12.10) 
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d N M 
t= pil) = Y разн Fins) — р‹(в)м (5) |, (12.11) 


i=1 k=1 


1 
Wik(S) = 51 — sik tanh(Bhix(s))], (5) = у, Jik je5je + Ө. (12.12) 
jt 


This master equation describes a process where at each step a site i € (1, ..., N} and 
a Trotter slice k € (1, ..., М} are picked at random, followed by an attempt to flip 
the spin ѕ;к. The у; (s) denote transition rates for sj; — —s;x. Fj, is an operator that 
flips spin э; and leaves all others invariant. The parameter т defines time units such 
that the average duration of a single spin update is t/N. Working out the local fields 
һк(8) gives 


1 B 
һб) = ү У Susie + g бын + sun A/M. (12.13) 
jzi 


The process in Eqs. (12.11, 12.12) is suitable for numerical simulation and defines 
the quantum Monte Carlo dynamics for the ensemble with Hamiltonian given by Eq. 
(12.3) provided we take M — oo. When applied to quantum annealing models, some 
authors have called it ‘simulated quantum annealing’. The definition in Eqs. (12.11, 
12.12), however, is not unique. Many alternative stochastic processes evolve toward 
the same Gibbs-Boltzmann state (see, e.g., [6]). 


12.3 Dynamical Replica Analysis 


The remaining challenge is to extract formulae describing the evolution of relevant 
macroscopic quantities from Eqs. (12.11, 12.12). This was addressed in [7-10] using 
the so-called dynamical replica method (DRT) [12-14]. In this chapter, we deviate 
from the definitions in [7-10] and stay closer to the original DRT ideas. 

The dynamics (12.11, 12.12) imply that expectation values (G(s))= У), p;(s)G(s) 
evolve according to: 


N М 
£ (GG) = LEY so.we[ewus-coo] 02.14) 


i=l k=1 s 
To study the joint dynamics of a set of L observables Q(s) = (Qı (s), ..., Qz (s)) 
we substitute G(s) = d[@ — Q(s)]. Now (G(s)) = P,(&), and 


N M 
d 
cA) = 3  nGwc[ig - 9051-5 -9o]]. (12.15) 


i=lk=1 s 
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If the observables (2,,(s) are O(1) and macroscopic in nature, their susceptibility 
to single spin flips A jj, (s) = Qu (Fiks) — $2,(s) will be small. We can then define 
Ajk = (А ра (8), ..., Лук (8) € IR“, and expand (12.15) in a distributional sense, 
that is, 


(-1)° 9“ 
1 89, ...8Qu, 


dQ Р,(9)С(®) = [as G(Q) у, 
{>1 


L L N M 
| 375» УУ (wost [9 — Qs) Aug, (8)... Аи, о). (12.16) 
ш=1 u= 


l i=1 k=1 


We thereby arrive at the following Kramers-Moyal expansion 


d DES ovs at © jQ 12.17 
т P(@) = )) S : 3m. ae, LOA. 19: 1]. 012.17) 


en 
with 
N М 
PE ае (УУ ius AO), (12.18) 
і=1 k=1 Q;t 
GG); = 2, P GIG — 9(8)17(8) (12.19) 


Ds Pr(s)d[@ — Q(s)] 


Asymptotically, that is, for N, M — оо, only the first term of Eq. (12.17) survives 


L L 
Im gi d ` [Айы (8)... Аа (80) = 0. (12.20) 


I 
> 
ll 
= 


If all А; (8) scale similarly, that is, JAN.M such that Aizu (5) = O(An.m) for 
N, М — ow, then Eq. (12.17) retains only its first term if limy, моо LA uw NM = 
0. In that case it reduces to a Liouville equation describing deterministic evolution 
of Ө: 


N M 
«2 (Y Y бда), . (12.21) 


і=1 k= 


If іту, uoo L^ N.MN N M > 0, we can no longer ignore the fluctuations in our 
observables &(s), which limits our choice of observables. 

Equation (12.21) is closed if У УИ wis (S) Агу (8) is a function of Q(s) only 
(which would simply drop out). If this is not the case, we close Eq. (12.21) using 
a maximum entropy argument: we approximate p, (s) in Eq. (12.21) by a form that 
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assumes that all micro-states with the same value for €2(s) are equally likely. Now 
Eq. (12.21) becomes 


а о 2,019 = 9801275 У Wik) Аи (8) 


Y.5I — 9G] (12.22) 


Within the replica formalism [16, 17], this closed equation can also be written as 


d n N M 
TQ, = lim (Пге-өв®) УУ wis Aizu(s'). 0223) 


sl..s" Sa=l ї=1 k=1 


The accuracy of Eq. (12.22) depends on our choice for the observables ©, (5). We 
want them to be O(1), obeying іту моо LAy m/NM = 0, and such that the 
probability equipartitioning assumption is as harmless as possible. Including Н (5) / № 
and N`! log po(s) in our set of observables ensures that equipartitioning holds for 
t — 0 and г — оо. If we have disorder in the couplings {J;;}, and for N — oo our 
observables are self-averaging with respect to its realization, we can average over 
the disorder.? This gives 


d n N M 
TQ, = lim (Пяе вее) 57) wis!) Ai (81). (12.24) 


sl..s" Sa=l zi 


Forthe system in Eq. (12.8) and the typicalinitial conditions in quantum annealing, 
there are two natural and simple routes for choosing the observables in the DRT 
method,’ all involving the normalized distinct energy contributions in Eq. (12.27): 


e Trotter slice-dependent observables 
We choose, fork = 1... M (mod M), 


1 1 1 
Ex(s) = у У узі, my(s) = 2з Ex(s) = x зиз. (12.25) 


ї<] 


Now L = 3M, and the susceptibilities of the observables to single spin flips are, using D JijSjk = 
O (1) for all k (required for an extensive Hamiltonian): 


Aik Eq (8) = 29—18, Y Jijsjk = O(N"), (12.26) 
jži 

Ата (8) = —2N7!ôgksik = O(N!), (12.27) 

AikEq (8) = —2N I sik (SgkSi,k-+1 + dk g418i,k-1) = O(N-. (12.28) 


? Without disorder one does not need the replica formalism yet and can work directly with (12.22). 


3 One can always add further observables, or split the present ones into distinct Eo teen This 
generally improves the accuracy of the theory provided limy y, се LAW M VNM = 0 still holds. 
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5 1 
Hence Ay у = N^ ! so deterministic evolution requires that M « N3 as M, N — oo. Hence, on 
choosing Eq. (12.25) we can no longer take M — oo before N — oo, which would have been the correct 
order, and must rely on these limits commuting.^ 


e Trotter slice-independent observables 
These are simply averages over all Trotter slices of the previous set in Eq. (12.25), that is, 


Pf es i 
E(s) = M › Ex(s), m(s)— M › my(s), E(s) = M › €x (s). (12.29) 
k=1 k=1 k=1 


Hence L = 3, and the spin-flip susceptibilities come out as 


AjkE(s) = ANM) 1з X Jigs jk = ООММ) )), (12.30) 
j£ 

Aikm(s) = —2(N My 1з = ON M) b, (12.31) 

Aik£(s) = —2(N M). Isi Gi k41 45,1) = ONM)” 1). (12.32) 


= 1 
Now Ау м = 1/N M. Deterministic evolution requires іту M. , o5; (NM) 2 = 0, which is always 
true. We can therefore take our two limits in any desired order without having to worry about fluctuations 
in our macroscopic observables. 


12.4 Simple Examples 


We illustrate the previous approach by application to simple models. We investigate 
the commutation of the limits № — oo and M — оо, and the link between stationary 
states of the dynamical equations and the equilibrium theory. We start with the sim- 
plest case of non-interacting spins in a uniform x field, followed by non-interacting 
spins in uniform x and z fields and ferromagnetically interacting quantum systems. 


12.4.1 Non-interacting Quantum Spins in a Uniform x Field 


This is the simplest case of Eq. (12.8), where h = Jj; = 0 for all (i, j). Although this 
specific model is physically trivial, it is still instructive since it already reveals many 
general features of the more general dynamical theory. The statics analysis gives 


: N 
Zu (B) = |н нан manie , (12.33) 


with a 2 x 2 transfer matrix of the one-dimensional Ising chain: 


^ The assumption that the order of the limits N — oo and M — оо can be changed is also made in 
equilibrium studies such as [15], where steepest descent integration is used as N — oo for fixed M. 
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e? eB 
K= (5 B ) , eigenvalues: à+ = 2cosh(B), A. = 2 sinh(B). (12.34) 
After some rewriting and insertion of the definition of B we obtain: 


Zu (B) = [ez elt sinh26T'/¥)19M совм (B) + sinh” св]. 
= [2cosh(&T)]". (12.35) 


This gives the correct free energy density fy. = = log[2 cosh(8T)]. 

Next, we turn to the macroscopic dynamical equations in Eq. (12.21). Since Jj; = 
0, the order parameters E, (s) and E(s) are always zero. The two dynamical routes 
give: 


e Trotter slice-dependent observables 


The observables are (m; (5), £,(s)), and we are forced to take N — oo before 


М — оо. Using identities such as tanh[ B(s--5^)] = 1(s--5) tanh(2B) we obtain: 


= 
4 1 

таст = — mg + (таа +т—)!апһ(2В), (12.36) 
d 1 

t& = tanh(2B)[1+5(Ce+Cusi)] — 26, (12.37) 


in which, using the equivalence of the N sites i, we have the 2-slice correlators: 


Xx [T бт, —m, (81E -E G1 siis 
C= ‚ (1238) 


Y. | IT, lm, m, (SLE, —ё, 61] 


One can compute these for N — oo with fixed M via steepest descent integration: 


C _ Decay eda FaSatYaSa5a+) gp ska " 55 
i > е? aSa HYqSqSq+1) ' . 
$8]1...95M 
in which x = (xj, ..., xy) and y = (yi, ..., ум) are to be solved from 


ð log Z ð log Z 
тұ = ey = 


‚‚ ZG y) = У eoe, (12,40) 


$1...5M 


дх ^ t ау, 


Trotter slice-independent observables 
In this case, we only have m (s) and £ (s), and working out Eq. (12.21) gives 


d d 
tm = —m{[1—tanh(2B)], т = (1+C) tanh(2B) – 2£, (1241) 
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with 


ds lm —m(s) S[E —& (5)]51,151,з 
C= ; 12.42 
У, 6[m—m(s)]8[£ —£(s)] | 


Calculating the 2-slice correlator С using steepest descent results in 


See eu Lg зу YS) ууз 
pM ем La Sq YS seen 
1M 
ð log Z ð log Z 
m= ; E= 
Ox ду 


C= 


(12.43) 


20 2(х,у)у= У) ex aote, (12,44) 


$1...5М 


If at time zero the тк and & in Eqs. (12.36, 12.37) are independent of k, this 
will remain true at all times? and the dynamics in Eqs. (12.36, 12.37) simplifies to 
Eq. (12.41). Computing C involves solving a one-dimensional Ising model with a 
constant external field, whereas computing C, requires solving heterogeneous spin 
chain models in equilibrium for arbitrary coupling constants and fields. This is the 
second reason, in addition to the issue with limits, for why it is preferable to work 
with Trotter slice-independent observables. 

For non-interacting spins with л Æ 0 the analysis is similar. Here f = іту оо 
м.м = = В! log[2 cosh(BVT?2+h2)], with equilibrium magnetisation 


m = —0f/ah = tanh (BIRT) 77. (12.45) 


and the Trotter slice-independent observables are predicted to obey 


rom = ta-o) tanh(Bh/M) + 10,040) —m(1—Q.), (12.46) 
„е (1+C)O_+20,m — 2£, (12.47) 


dt 


with Q+ = 3[tanh(8h/M --2B) X:tanh(Bh/ M —2B)]. Since іт о Q+ = 0 and 
lim;..o Q- = tanh(2B), Eqs. (12.46,12.47) indeed revert back to Eq. (12.41) for 
h — 0. We inspect the fixed-points of Eqs. (12.46, 12.47) after having also added spin 
interactions in the next section. Clearly, since іту о Q+ = limy+.(l1—Q_) = 0 
the relaxation time of the system diverges for M — oo, with closer inspection reveal- 
ing that dm/dt = O(M ?). This makes physical sense: for large M, hence large В, 
the Trotter slices increasingly prefer identical states, so state changes (in a single 
slice) become rare as they require the mounting energetic costs of breaking the Trot- 
ter symmetry. 


5 In [7, 8, 10] this is called the static approximation. 
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12.4.2 Ferromagnetic z-interactions and Uniform x and z 
Fields 


We now choose Л Æ 0, Г Æ 0, and Л; = Jo/N for all i Æ j, so that the quantum 
Hamiltonian is Н = —(Jo/N) У. оѓо} — У; (ho? + Гох). This is known as the 
Husimi-Temperley-Curie-Weiss model in a transverse field [11]. In the statics, after 
some simple manipulations and using the short-hand Dz = (277) 26727 dz, we find: 


Zu (B) = е5 ММ 10215 sinh(2£T/M)]— 3 f Jo 


x Пп») (12.48) 


with the non-symmetric transfer matrix 


B+Bh/M+BJox/M  .—B+BJox/M | 
е е : 
K(x) = ( е—В—В.^х/М ат зым) ее Ki. (12.49) 


We first turn to the statics of ће model. It is not immediately clear whether ог not 
the limits N, M — oo in Eq. (12.48) commute. Upon taking the limit N — oo first, 
one obtains via steepest descent integration: 


T м us т 
im = о sin. 
Maga М" 3g Е M 


£o | (12.50) 


M 
1 

——extr, log Tr | | K(x) = х 
В | тг 2М 


We find the derivatives of the quantity V (x) to be extremized, with Sab = 1—баь: 


Tr [I ‚(би Erogo) К 
üxq М Tr TM, Кор) 
ар а ени Еи 
üxgüx, V M T ITA, КО) 
Tr [TA 1б 8с) КО) Te TT Ser 2-907) КО) | В T 
M M баг. ( 52) 
Tr I [k=1 КО) тг K (xx) M 


In Trotter-symmetric solutions xy = т for all k, these derivatives simplify to 
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zEM, 
Ow _ [е E (m)] | (12.53) 
дж — M | Tr[K "(m)] 
rw (г) eee ea 
üxüx, NM Tr[K“(m)] Tr[KM(m)] 
—(BJo/M)&,. (12.54) 
and m is the solution of 
_ Tr[o* K"(m)] 
= MG (12.55) 


Trotter symmetry-breaking bifurcations occur when Det[(8 Jo/ M) A — I] = 0, where 


Т[о*К!4—"(т)уе*кМ—!4-"(туу „ 
m. 
Tr[K (т) 


Ay (12.56) 


We introduce the symmetric matrix Q(m) = e-2" 80/0" К (m)e2™ W/M with 
eigenvalues А+ (x) and orthogonal eigenbasis |+}. Now for any £ є IN we have 


K'(m) e$" UII (15 (mpl) НАС (m|) (I)e 2" 0^P07, (12.57) 
and hence, with the short-hand о, = (alo*|b) and ф = A (m)/X4 (m) є (—1, 1): 


Tr[oK"(m)]  oi,co: $" 


= ; 12.58 
Tr[ K "(m)] 1+фМ | 

22 \q-r| M-—|q-r| z 12 M 222 

+ $ alet iiia 
А2 [Ф i По фо 2, (12.59) 

1+фМ 
Since A has a Toeplitz form, we know its eigenvalues: 
2:12, е M T= 2 

Е М = = dee) (12.60) 


~ 14” 1+ф?—2@ф соз(2л(К—1)/М)` 


Finally we need to diagonalize О (m) for large M. This gives: 
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eB+B(h+Jom)/M eB 
Q(m) = ( eB mum (12.61) 
ha (m) = еей Prem Heo. (12.62) 
: p. L + (h+ Jom)? 4 T? 

jim |+) = c aa (h+ Jom) + V'h Jom)? er? , (12.63) 


1 
(т 
Суту = V2 di my +Г? + (h+ Jom) (h+ Jom)? +? | ? (12.64) 


It follows that 


dis e A ht Jom) TO (M 2). (12.65) 


Hence іту о $—1, limmo ФУ = ехр[—28/ (h+ Jom)? --T?], limus, 01, = 
= іту оо = (h+ Jom)/4A/ (h+ Jom) +T?, and lim yoo Of = 
T/y (h+ Jm)?--T?. The equation for the magnetization m and the eigenvalues of 
A thereby become 


(h+ Jom? «T2 

_ (i Jom) tanh[B y (h+ Jom)? +T 1 (12.66) 
Уп Јот)2+Г2 

_ Г? tanh[B//(h+Jom)2 - T?] [142 


(h+ Jom? 4- T? 


Jim =. = (12.67) 


М->оо 1—ф2 


Since all a, аге bounded for large М, ће condition BJoa,/M = 1 for bifurca- 
tions away from the Trotter-symmetric state are never met, indicating that the 
state described by Eq. (12.66) is the physical one. The free energy density f = 
lim y-+oo Шу оо fN.M is 


zg oe [5 ZET] ЕЗ ; log (zoo?) 


= som? - 5 log [2 cosh (6V h+ Jom)? er?) |. (12.68) 


1 
J= gm = im XP sinh ( 


Extremizing the expression in Eq. (12.68) over m reproduces Eq. (12.66). 
We return to Eq. (12.48), and now seek to take the Trotter limit M — oo first. The 
complexities are all in the evaluation for large M of the quantity 


VY еВ +Вһ/М —В N 
ZM - [П>] [stre B "e E emu) || . (12.69) 


This can be analyzed using random field Ising chain techniques [18]. Alternatively, 
we can use the fact that in summations of the form Ук Zk, each zę effectively scales 
as O(M~?), enabling us to use e~ 8 = /tanh (BT /M) and a modified version of the 
Trotter identity, viz. [ [, (е*/Ме”/М\ = ем Улем +Y to derive 
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M M N 
Blo gz 1 
Zu = еМ fiu Da In] П ev ate (1+ nose OG )| | 
k=l k=l 
dm 1 2 z x чу] 
= J/BJoNeNM? | ———е—:#85^+т {Tr ОИ 3 ‚ (12.70) 
J 27 
The free energy density f = Шу o; limy+oo fw, then becomes 
1 Buson) | bum _ 1 2 
T = деч» | log (ейн еы, р (12.71) 


in which ш (m) аге the eigenvalues of the matrix Г. (т) = (h+ Јот)о + Го": 


Lim) = Ми mm , Ba Qn) = +y (h+ Jomy? -T?. (12.72) 


We now recover Eqs. (12.66, 12.68), so the limits N — oo and M — оо can be 
interchanged: 


1 1 
f = exttn{ Jom? 8 о? [2соһ (вотот т). (12.73) 


2 


We next turn to ће DRT dynamics. The energy and the usual initial condi- 
tions can once more be expressed in terms of {m x, Ex} (slice-dependent observ- 
ables) or (т, &) (slice-independent ones). We define the short-hand Q.(m)= 
1 tanh(6(Jom+h)/M+2B)+ 1 tanh(B(Jom +h)/M—2B) є (—1, 1). Upon insert- 
ing Eqs. (12.27, 12.28) and Eqs. (12.31, 12.32) into Eq. (12.21), with the fields 
А (8) = M~'[h+ Јот (s)] + (B/B)Gi каз) + O(N), and using expres- 
sions such as tanh[a + b(s+s’)] = 1(10--s)(10--s^) tanh(a+2b) + 1(1—5)(1—5') 
tanh(a — 2b) + i —ss’) tanh(a), one finds the following descriptions: 


e Trotter slice-dependent observables 
Our observables are m4(s) = N ! У), sig and £,(s) = №! У, sissiqui, for q = 
1...M,and we must take the limit N — oo before M — oo. We note that 


1 1 
tanh (Bhix(s)) = 7 (L-Esipeasi-i) Qni) + > (Sik +5141) Q- (me (8)) 


1 
5 0 Siksik) tanh (B+ Jom, (s))/M), (12.74) 


so with the correlators C; in Eq. (12.39) the dynamical laws take the form 
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d 1 1 
тт; = 50 * €) 0+(та) + 5 ai +Mq-1) Q— (та) — та 


dt 
1 
+5(1—C,) tanh (белт) (12.75) 
а 1 1 
tape = z Mari +Mq-1) 0+ (mg) + z tC) Q- (тд) 


1 1 
t5 +1442) O4 441) + z tCar) Q- nuu) 
1 B 
+0 - mq) tanh и“ Joma) 
1 B 
+5 (tg — тач) tanh ми“ soma) = 2€,. (12.76) 
For slice-independent initial conditions, where m, = m and & = E, this becomes 


cm = J0+00+m)+m0-(m)-m+ 301-0) tanh (F (+ Jom)) (12.77) 


25 отот) + (1C) Q. (т) — 2€, (12.78) 


with the correlator C in Eq. (12.43). 


Trotter slice-independent observables 

For the choice (m, £) there is no constraint on the order of limits, but the quantities 
m, (s) appearing inside tanh(Bh;; (s)) can no longer be replaced by deterministic 
macroscopic observables, but must now be calculated. Using Trotter slice permu- 
tation symmetry wherever possible, one finds 


d jm 
Ug 7 2M 2 (1+С,()10+ 0m.) + [тк+1(8)-+тк-—|(5)] Q (тк (s)) 
+[1—Су(в)] tanh(B(h+ Jom G))/ Му), QU (12.79) 
M 
go Y (т O т 108010 i) 
КЕ? = м 2, k+1 k-1 +" АР 
1 M 
b x ul еее (m9). . —2ё, (12.80) 


=1 


with C(s) = №! >”, Si,k+15i,k-1- For large M апа №, and in view of the inter- 
changeability of the limits M — oo and N — œ in the equilibrium calculation, 
we may anticipate (and can indeed show) that we can neglect the fluctuations in 
the values of {тк (s)) and simply replace m, (s) — m(s) + o(1) in the right-hand 
sides of the above equations, upon which these simplify to Eqs. (12.77, 12.78). 
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12.5 Link Between Statics and Dynamics 


We now show that for M — оо, the stationary state of Eqs. (12.77, 12.78) reproduces 
the equilibrium result in Eq. (12.66) as expected. The fixed-point equations of Eqs. 
(12.77, 12.78) are 


m= 1a+0)0+ (т) +mQ_(m) + 5-0) tanh (i+ Jom) » (12.81) 
E = тот) + 30--C) 0. (m), (12.82) 


with the correlator C = C (m, E) € (—1, 1) to be solved from 


M 
=1 Sk HYSkSk+1) 
_ У Lisl +15153 


C (12.83) 


У, я ех. Gi sese) 
15M 
1 dlogZ _ 1 8logZ 


m= ; = 
M Ox M ду 


Z(x, y) = X eX FY SESH) (12.84) 


КУУ 


We compute Z (x, y) viathe transfer matrix К (x, y) with elements Kg ere HHS, 
This gives Z(x, у)=А (x, y) +A” (x, y), where A+(.) are the eigenvalues of K (.), 


+(x, у) = e” ( соза) +y sinh* (x) +67). (12.85) 


For the equilbrium values of (m, £), Eq. (12.84) are solved by 


ÜU 


х= B(h-- Jom)/M, y B—— 5 log tanh (5) , soe *' = tanh? (2) ‚ (12.86) 


This claim is confirmed by substituting these as ansátze into the expressions given 
in the appendix. The key ingredient ф = A_/A of our formulae then becomes 


2p 2LpQ2 -3 
log ó = зуу ht Jom) +r? +0(М7?). (12.87) 
Hence for M — oo the formulae for m and E in Eq. (12.84) become 


n (h+ Jom) tanh[ B. J/ (h+ Jom)? - T2] "UT 
V (h4- Jom)? 4- T? | | 


in which we recognize (12.66). For large M one finds О (т) = O(M-?) and 
Q_(m)=1- 2(08г/м)? -FO(M 2), so expansion of the fixed-point equations gives 


(12.88) 
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_ Е h+Jom _1 
m = M(1—C) Agr? + O(M^ 5, (12.89) 
E= laol — 2(BT/MY)] + O(M^)). (12.90) 


The first equation implies that C — 1— С / M for M — co, with С = O(1). In turn, 
this gives £ — 1— 5 +O(M~). What is left in our proof is to show that m obeys 


h+Jom .. 
= gr m Ma-O. (12.91) 


We hence compute the correlator C to order M^, using the identities in the appendix: 


cosh [GM-2) log 6] (1 (41014 ») 
cosh [iM log Ф] | | 


(h+Jom)?  соѕ8(1—4/М)у (+ Јот)2+Г2] 
Е | cosh[8 y (h+ Jom)? - T2] 


(h+ Jom)? +T? 
r? B 1 

x | 
(h+ Јот)2+Г2 M2 


1 ABT? ( 1 ) 
1— — tanh | 8y (h+ Jom)? +T? +O . 12.92 
M i о) | (h+ Jom)? +T? M? ( ) 


We can now read off the value of C, and the condition in Eq. (12.91) is found to 
reduce to Eq. (12.66), so that it is indeed satisfied. This completes the demonstration 
that for large M, the macroscopic Eqs. (12.77, 12.78) indeed have the equilibrium 
state as their fixed-point. 


С = (+|о*|+)? 


12.6 Evolution оп Adiabatically Separated Timescales 


We return to the dynamical laws in Eqs. (12.77, 12.78). As noted earlier, these exhibit 
a divergent relaxation time for the magnetization for large M, suggesting that the 
dynamics have distinct phases. The first phase is studied by choosing t = O(1). 
Using 


282г? 
M2 


3p2 
Ons ЁЁ сот 1 O(M-4), Q (m) = 1— -O(M-5, (12.93) 


we here find that 


d 
m = mo + O(M7!), TE = 1+С(то, €) – 2E + O(M"). (12.94) 
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Soon these timescales the magnetization does not change, whereas the Trotter energy 
evolves to the solution of the fixed-point equation € = 1 + $C (mo, €), in which 
C (mo, €) is to be solved from the following equations according to the appendix: 


inh(x) tanh[1 M 1 
ЭШЕ ae gl: (12.95) 


v sinh? (x) J-e-4» 


inh? cosh[(4M— 1) lo —4y 
oa e) ол pe - ‚ (1296) 
sinh“ (x) J-e-^» cosh[;M logó] _ sinh*(x)+e~4” 


inh? cosh[(4+M— 2) lo -4y 
C= SO Lue ien Еб - ‚ _— (12.97) 
sinh“ (x) J-e-4» cosh[;M log] sinh (x) +e74y 


with @ = [cosh(x) — y sinh? (x) J-e-^"]/[cosh (x) + y sinh? (x) J-e-^]. Inspection of 
these equations reveals that the correct scaling with M requires (x, e~”) = (u, v)/M, 
with u, у= O(1). Now 1Mlogó 2 —/u?4-?--O(M 2), E = 1-E/M+O(M~), 
and C — 1-2€/M+O(M~), in which (и, v) are solved from 


u tanh(/u2+v2) Е 2? tanh(./u2+v2) 


Although the fixed-point equation E E is now solved to order O(M~'), computation 
of É requires higher orders of М. Опсе € = 1— —ÉIM--O(N- 2) and C (m, €) = 
1— 2€/M+O(M~?), we find dm/dt = O(M~’) and d£ /dt = O(M~”), so nothing 
evolves further macroscopically on these finite timescales. 

Since we need т = O(M~”) to probe the macroscopic evolution of the system on 
larger timescales, spin flips in the Trotter system are attempted on unit timescales of 
O(M? N).5 With the choice т = М2, and upon defining M(1—€) = ё and M(1— 
C)= C, the macroscopic laws (12.77, 12.78) become 


(12.98) 


mo = 


d 
т 
dt 


px 1 
= ;СВФ+ Jom) 2mf?T? +O (x): (12.99) 


€ = 4МВ?Г? — M?(o€ €) — 88? T? m(Jom--h) — 28?r сно J (12.100) 


The quantity Č = C (m, £) is to be solved together with (x, y) from Eqs. (12.95, 
12.96, 12.97). The relevant scaling is still (x, e?) = (и, v)/ M, with u, v = О(1), 
but according to Eq. (12.100) we now need more than just the leading order in M^. 
Using 


эн у? 
logó = DAMM + O(M7), (12.101) 


6 This reflects the high energy cost of breaking Trotter symmetry to induce magnetization changes. 
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the equations for € and C take the form € = &)(u, v) and С = E»(u, v), where 
Z ШК | 2 Е, 
T EN (И У .Q.2í( 0 V (и, у) 
uz [sinh (с) +з] | sinh () "ol (12.102) 


F(u, v) = cosh (5-0) юго] . (12.103) 


Now, after tedious but straightforward expansion in M^! one finds that 


F(u, v) 244 u?4- v? 
ex tanh (Vu? ) 
ГЭР M an ( u-+v 
202 2 2 
ее +O(M>). (12.104) 


Hence 


20у? tanh(./u2+v2) " 20 
M ми? +у? M2 


It follows that the equations for E=M (1—&) and C=M (1—C) take the form 


E(u, v) = 1 +O(M~). (12.105) 


2 p2 anh u^ v^) 2y? 
= 2у 


^ 
ё +O(M~?), C =28-“ фом?) (12.106) 


A/u24- v? М ` 
The dynamical equations then become 

ы EB(h+Jom) — 2mB?T? + О ! (12.107) 

—т = т)—2т — |, Я 

dr : M 

dz 22.2 312 21-2 1 

т = 4M(B^T^—v^) – 86 T^m(Jom-- h) – 48ГЕ + О mh (12.108) 
What remains is to express v in terms of (m, £), in leading two orders, by solving 
Eq. (12.106) for € alongside our equation for m. The latter is 


u tanh(v u? +v?) 
fee 


+O(M~’). (12.109) 


Equation (12.106) shows that v = 0 corresponds to ё = 0, and that € increases with 
v?. On intermediate timescales т = M~!, we have 


d 1 d ~ 1 
S E —€ = 4А(8°Г?—у?° NN 12.11 
— о(5). — ^ «o(z). (12.110) 
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where m remains constant and £ evolves toward the value for which v — ВГ + 
O(M~—') (which is also the equilibrium value for v). Thus, in the dynamical equations 
in Eqs. (12.107, 12.108) describing the process on timescales of т = M7? we must 
substitute v? = 82Г? + O(M~!). Thus, during the slow process where m evolves 
we always have 


ё = 28]? T? m/u. (12.111) 


Upon insertion into Eq. (12.107), this results in a closed dynamical equation for m 
only: 


d — 28272 B(h+ Jom) tanh(,/u2+ 82Г?) 
atc үг? 


without requiring additional approximations, and with и to be solved from" 


u tanh(,/u2+ 821?) 


ni | (12.113) 


Wut Bre 


In equilibrium we recover from Eqs. (12.112, 12.113) the correct equilibrium state 
in Eq. (12.88), with u = B(Jom+h). Comparison with Eq. (10) in [7] reveals, apart 
from a harmless difference in time units, that the approximation of [7] (used also in 
[8-10]) implies replacing u at any time by B(Jom+h). While this indeed holds in 
equilibrium, the approximation may be dangerous far from equilibrium. 

In Fig. 12.1 we test the predictions of Eqs. (12.112, 12.113) against numerical 
simulations of the process in Eqs. (12.11, 12.12). The approximate co-location of 
the simulation curves for widely varying values of M confirms that т = O(1/M?) 
(inferred from the dynamical theory) indeed captures the characteristic timescale 
of the macroscopic process. Second, while not showing perfect agreement with the 
simulation data, which is not expected in view of the probability equipartitioning 
assumption used to close the macroscopic dynamical equations, away from station- 
arity the full theory in Eqs. (12.112, 12.113) is reasonably accurate and improves 
upon the approximation proposed in [7]. 


2l (12.112) 


12.7 Discussion 


In this chapter we aimed to explain the basic ideas and assumptions behind the DRT 
strategy for deriving and closing macroscopic dynamical equations, and its applica- 
tion to the types of spin systems used in quantum annealing with transverse fields. 


7 For certain values of m and ВГ, Eq. (12.113) may have more than one solution u. In such cases 
the physical solution is the one with the largest absolute value. 
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Fig. 12.1 Theory versus computer simulations of the microscopic process in Eqs. (12.11, 12.12) for 
the Trotter representation of the system with Hamiltonian H = —(Jo/N) У; i оў оў — 3 (ho? + 
Гох), with N = 10000 and M e (3, 12, 48, 192}. In all cases Jo — 1, T =T =0.5, and t = 1/M? (so 
time units correspond to N M? attempted moves per spin). Left figure: magnetization versus time 
for  —0.1; right figure: the same for h =0.5. The simulation data are shown as connected markers. 
The black curve is the theoretical prediction, that is, the solution of Eqs. (12.112, 12.113). The light 
blue curve is the approximated theory of [7], obtained by solving Eq. (12.112) with the equilibrium 
value u = B(Jom + h) 


We focused on technicalities relating to commutation of the limits N — oo and 
М — оо, the possible choices of macroscopic observables, the distinct M -dependent 
timescales in the evolution of the Trotter system, and on how an additional approxi- 
mation made in earlier studies can be avoided, leading to a more precise dynamical 
theory. We have tested the theoretical predictions of the theory against numerical 
MCMC simulations of a ferromagnetic quantum system [11] with transverse exter- 
nal fields in Trotter representation and found good agreement. 

Since there was no disorder in the examples used in this text, we could work with 
the dynamical laws in Eq. (12.22). If, in contrast, there is disorder in the problem, 
the macroscopic laws need to be averaged over its realization, and the main tool is 
Eq. (12.24). For models with random interactions, performing this disorder average 
is, however, relatively painless and does not make the dynamical theory significantly 
more complicated. 

We hope that this introduction to the method may aid the development of further 
analytical studies of the macroscopic dynamics of quantum annealing, including 
models with time-dependent control parameters, more realistic quantum systems with 
disordered spin interactions or with interactions on finitely connected graphs, and 
more precise analytical descriptions in which the macroscopic dynamical observables 
are functions [14, 19, 20] instead of scalars. 
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Appendix: Mathematical Identities 


Here we list some basic properties of relevant transfer matrices and expectation 
values in the single-site Trotter system. The transfer matrix and its eigenvalues are 


ytx ey 
к (OF EL) а еа more] 0211 


The corresponding normalized eigenvectors are 


|+) = Lfe”, V sinh? (x) + e-^» — sinh(x)), (12.115) 
ji = (чвп?) + е-® - sinh(x), -e), (12.116) 
[sg un (V'sinb?G) -„е-%® — sinh(x)) . (12.117) 


From these expressions one can find (+|o*|+) = + sinh(x)/V sinh? (x) J-e-^», and 
compute the following observables (with ф = A_/A4): 


Essu 51 ПЕ Kus _ _ sinh) tanh [5M logó] (12.118) 
Ў Па Jsinh2 (x) -+e—4 А 
Уз 5152 [ese D dine 
Pos boss sinh? (х) е4 
cosh [(4M—1) lo e 
| ate L m sinh? (x) -e-4» (12.119) 
изм 193 Teer Каза sinh? (x) 
Dd ПШ Ec sinh? (x) -e-4» 
, cosh [(34—2) log 9] e ^ 
^. cosh[iM]logó]  sinh?G)-e-* 
(12.120) 
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Chapter 13 A) 
Mean-Field Analysis of Sourlas Codes get 
with Adiabatic Reverse Annealing 


Shunta Arai 


Abstract In this chapter, we analyze the typical performance of adiabatic reverse 
annealing (ARA) for Sourlas codes. Sourlas codes are representative error-correcting 
codes related to p-body spin-glass models and have a first-order phase transition for 
p > 2, which degrades the estimation performance. In the ARA formulation, we 
introduce the initial Hamiltonian which incorporates the prior information of the 
solution into a vanilla quantum annealing (QA) formulation. The ground state of the 
initial Hamiltonian represents the initial candidate solution. To avoid the first-order 
phase transition, we apply ARA to Sourlas codes. We evaluate the typical ARA 
performance for Sourlas codes using the replica method. We show that ARA can 
avoid the first-order phase transition if we prepare for the proper initial candidate 
solution. 


13.1 Introduction 


Problems in information processing have been studied analytically from the view- 
point of statistical mechanics [12]. Associative memory, Sourlas codes, code-division 
multiple-access (CDMA), and image restoration are very popular examples [5, 6, 21, 
24]. Many studies have focused on the degradation of the original signal or informa- 
tion due to noise. The noise can be physically regarded as thermal fluctuations. The 
original information can be estimated from the degraded data by tuning the strength 
of thermal fluctuations. 

In this chapter, we focus mainly on error-correcting codes such as Sourlas codes, 
which are described by p-body spin-glass problems [21]. The main idea of error- 
correcting codes is to add redundancy while sending information to decode the orig- 
inal signal from noisy outputs. In Sourlas codes, the original signal is encoded in 
the interactions of the spins. To estimate the original signal, we search the ground 
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state of the Hamiltonian or compute the expectation value over the Gibbs—Boltzmann 
distribution at a finite temperature. 

In addition to thermal fluctuations, quantum fluctuations can also be used to infer 
the original information. Several studies have demonstrated that quantum fluctuations 
such as the transverse field do not necessarily enhance the performance of decoding 
for image restoration, Sourlas codes, or CDMA [2, 6, 15, 16]. The optimal estimation 
performance using quantum fluctuations is inferior to that using thermal fluctuations 
in Bayes-optimal cases. However, in some non-Bayes optimal cases, the estimation 
performance using finite quantum fluctuations and thermal fluctuations surpasses 
that using only thermal fluctuations; for example, when the assigned temperature is 
lower than the true noise scale. This implies the potential of combining quantum and 
thermal fluctuations for signal recovery problems. 

Signal estimation algorithms using quantum fluctuations are related to optimiza- 
tion algorithms using quantum fluctuations, which is known as quantum annealing 
(QA) [9] or adiabatic quantum computation (AQC) [3]. The QA algorithm is phys- 
ically implemented in the quantum annealer [7]. The quantum annealer has been 
tested in numerous applications, including traffic optimization [11] and in vehicles 
in factories [14]. 

In aclosed system, the QA procedure is as follows. First, we set the initial state as 
the trivial ground state of the transverse field term. Next, we gradually decrease the 
strength of the transverse field. Following the Schrodinger equation, the trivial ground 
state evolves adiabatically into a nontrivial ground state of the target Hamiltonian, 
which is consistent with a solution of combinatorial optimization problems. The 
quantum adiabatic theorem indicates that the total computational time for searching 
the ground state is characterized by the minimum energy gap between the ground 
state and first excited state [23]. When the target Hamiltonian has a first-order phase 
transition, the computational time to find the ground state grows exponentially. 

Reverse annealing (RA) is a protocol for restarting quantum dynamics from the 
final state of the standard QA procedure [17]. The RA algorithm can be used to 
avoid or mitigate the first-order phase transition and is classified into two methods: 
adiabatic reverse annealing (ARA) [13] and iterated reverse annealing (IRA) 
[26]. ARA and IRA are distinguished by how the final state is utilized. One imple- 
ments the final state by introducing the initial Hamiltonian, and the other incorporates 
it as the initial condition. 

In a recent study [2], ARA is applied to CDMA multiuser detection. ARA can 
avoid or mitigate the first-order phase transition in the CDMA model. In this chapter, 
we apply ARA for Sourlas codes. Sourlas codes have a first-order phase transition 
for p > 2. The existence of the first-order phase transition deteriorates the estima- 
tion performance. We evaluate the typical performance of ARA for Sourlas codes 
using the replica method. We demonstrate that ARA can avoid the first-order phase 
transition of Sourlas codes if we prepare the proper initial conditions. 
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13.2 Sourlas Codes Using Quantum Fluctuations 


Following a previous study [15], we formulate Sourlas codes using quantum fluctua- 
tions. Sourlas codes are set up to send a set of products of p spins Ji ..i, = &i, ... i, 
through a channel. The symbol 4; = +1(i = 1... №) represents the original signal, 
which is independently generated from the uniform distribution P(&;) = 1/2. We 
consider the Gauss channel as 


1 
МР! 2 N?P-! Jo pléi E &, 2 
P (Jii HED) = (=) exp |-5s (4. = art) ‚ (13.1) 


where J and Jọ are hyperparameters. The ratio Jo/J represents the signal-to-noise 
ratio. The distribution P (J; i, {E} is the conditional probability of the signal J; . j, 
for the encoded signal &;, ...5;.. We infer the original signal {5} from the noisy 
outputs (Jj... ;, J. Using the Bayes formula, we introduce the posterior probability for 
the estimated signal с = (o1 ... on} € {£1} as 


P((Ji..i,]o) P(o) 


Р.) = у= P (Ui, 1,0) P (o) 


(13.2) 


where P ((J;,..;,)| a) and P (ø) are the likelihood and prior distribution, respectively. 
The summation of spin variables У, is defined for all possible configurations. The 
likelihood can be expressed as 


PCS, ipo) xexp[B у) Jaisa |, (13.3) 


i «xb 


where В is the inverse temperature and the summation b» <- <i, TUNS Over all possible 
combinations of p spins out of N spins. According to Eqs. (13. 2) and (13.3), the 
posterior distribution can be written by using the Gibbs—Boltzmann distribution with 
the classical Hamiltonian H (o), as follows: 


1 
Р(о |02.) = -у expt- B O4) + Hini (о))}, (13.4) 
Z = Dl exp{—B (Ho)  Ttsi(0))) . (13.5) 
HO)=— M Jain Oi (13.6) 


where Z is the partition function and Hini(o) is the initial Hamiltonian, which 
represents the prior information of the estimated signal. We generally assume that 
the prior of the estimated signal follows a uniform distribution P (o) = 1/2. 
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To decode the original signal, one decoding strategy is the maximum a posteriori 
(MAP) estimation, which corresponds to searching the ground state of the classical 
Hamiltonian of Sourlas codes in the limit of zero temperature. Another is the marginal 
posterior mode (MPM) estimation, which corresponds to finding the expectation 
value over the posterior distribution at a finite temperature. In the limit of zero 
temperature, the MPM estimation is consistent with the MAP estimation. In this 
chapter, we mainly consider the MPM estimation. The estimation performance can 
be evaluated by the overlap between the original and estimated signal as 


M (B) = Trg П | m Paas i tbssenton (13.7) 


iex 


where (-) is the expectation over the posterior distribution P (o |( Ji, 2р5 This quan- 
tity is expected to exhibit a “self-averaging” property in the thermodynamics limit 
М — со. This means that the observables, such as the overlap for a quenched real- 
ization of the data {Jj,..i,}, and &, are equivalent to the expectation itself over the 
data distribution P(&) P({ Ji,..i, tE). In this case, the overlap can be expressed as 
limy o; M = [&sgn(o;)], where the bracket [-] indicates the expectation over the 
data distribution. 

Quantum fluctuations can be utilized to decode the original information. The 
Hamiltonian of Sourlas code using quantum fluctuations is expressed as follows: 


A = so + (1—s)Hrr, (13.8) 
ЕВИ тг (13.9) 
ip «xi, 
N 
Hor = — 67, (13.10) 


where 67 and 67 are the z and x components of the Pauli matrix at site i. We param- 
eterize the Hamiltonian by the annealing parameter s for the ARA formulation. Note 
that Ho and Hrr consist of the z and x components of the Pauli matrices, respec- 
tively. As in the classical case, we can consider the MPM estimation using quantum 
fluctuations. The performance of the MPM estimation using quantum fluctuations 
can be evaluated by the overlap as follows: 


M (B, з) = Tre) f П 45,5, POJ ai EDP AEDE вл те 


lx xb, 


= [&sgn((67)n9)]. (13.11) 


where (())r = Tr (© ô) denotes the expectation over the density matrix 6 = 
eP” Tre PIE. 
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13.3 Replica Analysis for Adiabatic Reverse Annealing 


Following Ref. [13], we formulate Sourlas codes using quantum fluctuations in ARA 
as follows: 


A = sio + (—5)(1 — АН + (1 — 5)АЛЇть, (13.12) 
N 

Hine = – У тб], (13.13) 
і=1 


where л (0 < à < 1) is ће RA parameter. We now introduce the initial candidate 
solution т; = +1 that is expected to be close to the correct ground state £j. We define 
the probability distribution of the initial candidate solutions as follows: 


N N 
P(t) = | [Pæ =] [ 8 — &) + с-18(@ + 50), (13.14) 


i=l i=1 


where we utilize the symbol cı = c and c_; = 1 — c. The number c (0 < c < 1) 
denotes the fraction of the original signal t; = &; in the initial candidate solution as 


N 


c= ГЭЭС (13.15) 


isl 


We consider that the ARA formulation is the case when we adopt Р(о*|т) х 
exp (- B Йа) as the prior distribution. 

The typical behaviors of the order parameters, such as the overlap, can be 
obtained via the free energy. The free energy density / can be evaluated as 
—pf =limy..(1/N)[In Z] in the limit of N — oo where Z = Tr exp (-eñ 
is the partition function of Eq. (13.12). In general, the direct computation of the 
free energy density is hard due to the configuration average of ln Z and the off- 
diagonal elements in Eq. (13.12). The configuration average can be found using the 
replica trick [20]. Even though we can avoid the direct computation of [In Z], we 
cannot apply the standard techniques to evaluate the free energy density due to the 
non-commutativity of the Hamiltonian. 

First, to eliminate the non-commutativity of the Hamiltonian, we apply the 
Suzuki- Trotter decomposition [22] to the partition function: 


А " 1— 5)А ^ M 
Z= Mum Tr {exo ( Е (sfo + (1—s)(1 2) exp (-E 9*9.) 


= lim Zy, (13.16) 


where 
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M N 
2м =Trexp кеу у, 920) «07 (t) + ——— —— — Ва = Dye 


t=1 «xxi 


B —s)X N А N M . 
pO €) x ELI [te Olof v0? (010 + D), 
i=] isl t=1 


(13.17) 


where the symbol т is the index of the Trotter slice, M is the Trotter number, 
and Tr denotes the trace in the z and x basis. We impose the periodic bound- 
ary conditions o7 (1) = oř(M + 1) for all i and introduce the identity operator 
l= ERES Ho (tp ({o*(£)}| and 1 = У оғ} Цо (t)}) (£o (t)}|. The detailed cal- 
culation is given in Appendix 13.5. 

To evaluate [In Z], we utilize the replica trick [20]: 


[2"]—1 
[log Z] = lim ————, (13.18) 


n0 n 


where n is the replica number. The replicated partition function can be written as 


Z"|- lim X X РРО) [T fan " iP, a ME, E) 


&j—c1) (ti iex 


x Trexp ES у, Jane oy xs oj, 0 + POLUM yt sar) 
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LED Yes) [Tito e» (од ote + D), (13.19) 
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in which a denotes the replica index. 

To remove the a of the original signal {Ẹ}, we apply the gauge trans- 
formation J;..;, > Ji,...i,§i, --- Ei, and ор (t) — o,,(¢)&; to the partition function 
[Z5]. Performing the Gaussian integration over the distribution in Eq. (13.1), we 
introduce the following order parameters as 


N 
ma(t) = x >a, (13.20) 


qab(t, t) = ү? виру), (13.21) 


N 


R(t, t") = x Уу о, (002 (t^, (13.22) 
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1 N 

m*(t) = T Уу 0r (13.23) 
i=l 


The physical meanings of the order parameters are as follows: m, (t) is the mag- 
netization, qap(t, ^) is the spin-glass order parameter, R4 (t, t’) is the correlation 
between each Trotter slice, and m} (t) is the transverse | magnetization. Moreover, 
we introduce the auxiliary parameters Ma (t), qp (t, t^), Ra (t, t^), т^ (t) of the order 
parameters with the delta function and its Fourier integral representation. Under 
the replica symmetry (RS) ansatz and static approximation, ma (t) = m, qap(t, t) = 
q, Ra(t, t) = R, mE (t) = mast) = т, Gavlt, t') = d, RU) = Ё, т) = 
m*, we can attain the RS free energy density: 


252 72 
—Bfrs = Bs Jom? + _ (А? — 4?)+8(1—5)Ат* — Bmm — Bm*m* 
= P ni- qq) + у, ca | ровак, (13.24) 
а=+1 
Y, = | Dy cosh pua, (13.25) 


Ua 


\/ 82 + (nt), (13.26) 
ga = M -+Еа(1— 90 дуда Е йу, (13.27) 


where Dz means that the Gaussian measure Dz := 1//2ndze *?, and Dy is the 
same as Dz. Detailed calculations for deriving the free energy density in Eq. (13.24) 
are provided in Appendix 13.5. The order parameters and their auxiliary parame- 
ters are determined by the saddle-point conditions in the free energy density. The 
extremization of Eq. (13.24) yields the following saddle-point equations: 


m- Y ca | раг f Dy (&) sinh Bulg, (13.28) 
Ug 


а=+1 


2, 
а= Ус „Јо '/ py (&) sinh pua) (13.29) 


а=+1 


х 2 
= 26 af og [> (99 » ) sinh fu + (&) enin]. 


d (13.30) 


n= » ПЕ fo »(= =) sinh fu (13.31) 


a= 
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m =sJopm?', (13.32) 
a ae 
= > pa^, (13.33) 
_ 2 J2 
R= pR”, (13.34) 
m* = (1 — 5)А. (13.35) 


From Eq. (13.11), the overlap function is easily expressed as 


M(B,s,4)= У? cx f Posen fr f py (&) sinh fu, (13.36) 


а=+1 4 


In the low-temperature region, the p-body spin-glass model is known to exhibit 
replica symmetry breaking (RSB) [4]. The stability condition of RS solutions under 
the static approximation is expressed as 


3 293 улу | 
PTT Das $2 z3 <1, (13.37) 


asti 


2 
Aa SES CHE (&) sinh fu, ) 
y! D (my А 8a " i 
-Y, у ^ sinh Bug + | Dy { = | cosh Bug . (13.38) 
Bu; Ua 


This condition, called the Almeida-Thouless (AT) condition [1], can be attained 
by considering perturbations to the RS solutions. This result is consistent with the 
previous result in Ref. [25] for p = 2, Jo = 0, anda = 1. 


13.4 Numerical Experiments 


We numerically solve the saddle-point equations in Eqs. (13.28)-(13.35) with p = 5, 
temperature T = 0.05, and signal-to-noise ratio Jo/J = 1.5. To evaluate the typical 
MPM estimation performance, we often utilize the overlap M (£, s, à). In this chapter, 
we focus mainly on the possibility of avoiding the first-order phase transition by ARA. 
For the sake of simplicity and computational cost, we adopt the magnetization as 
a measure of the average MPM estimation performance using ARA. Figure 13.1a 
shows the phase diagram of the Sourlas codes using quantum fluctuations in ARA. 
We consider three initial conditions: c — 0.7, 0.8, and 0.95. Each line represents a 
point of the first-order phase transition. We call these lines "critical" lines. We can 
avoid a first-order phase transition by preparing for proper initial conditions. When 
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Fig. 13.1 a Phase diagram of Sourlas codes in ARA for с = 0.7, 0.8, and 0.95. The vertical and 
horizontal axes represent the annealing parameter and the RA parameter, respectively. Each line 
represents the point where the first-order phase transition occurs. The AT line indicates where the 
AT condition is broken above the line. b Differences in magnetization between two local minima 
at the first-order phase transition in Fig. 13.1 (a). The vertical axis denotes the differences in the 
magnetization between two local minima at the first-order phase transition while the horizontal axis 
represents the RA parameter 


we increase the ratio of the ground state in the initial Hamiltonian, the region where 
we can avoid the first-order phase transition becomes wider. 

We also compute the AT condition Eq. (13.37). As shown in Fig. 13.1a, the AT 
condition is broken between the AT line and the "critical" line for c — 0.7. If the 
fraction of the ground state in the initial candidate solution is not enough, the spin- 
glass phase emerges and RSB occurs. The emergence of RSB implies the existence 
of a metastable state. Figure 13.1a shows that we can avoid RSB if we tune the RA 
parameter A. For c — 0.8, the AT condition is broken in the low A region. The region 
where the AT condition is broken is smaller than that for c — 0.7. Since we cannot 
distinguish the AT line from the "critical" line at this scale, we omit the AT line from 
Fig. 13.1a. For c = 0.95, the AT condition holds. Therefore, the local stability of the 
RS solution is recovered if we can prepare for the proper initial conditions. 

To evaluate the extent to which ARA mitigates the difficulty of estimating the 
original signal, we plot the differences in the magnetization Am between the two 
local minima at the first-order phase transition for c — 0.7, 0.8, and 0.95. Significant 
differences in the magnetization result in the separation of the two local minima of the 
free energy. Figure 13.1b shows that Am decreases as c increases. The two local min- 
ima of free energy are brought closer by ARA. As discussed in Ref. [13], the quantum 
tunneling rate between two local minima in the free-energy landscape increases if 
the distance between the two local minima is smaller. Our results demonstrate that 
ARA for Sourlas codes enhances the quantum tunneling effects if we prepare for an 
appropriate initial condition. This result is consistent with the CDMA model [2]. 
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13.5 Summary 


In this chapter, we explained a mean field analysis of ARA for Sourlas codes. Sourlas 
codes have a first-order phase transition with p > 2, which deteriorates their esti- 
mation performance. To avoid the first-order phase transition, we applied ARA to 
Sourlas codes. The first-order phase transition can be avoided by preparing for the 
proper initial conditions. The region where the first-order phase transition can be 
avoided becomes larger as c increases. We investigated the differences in magneti- 
zation between the two local minima at the first-order phase transition. When ARA 
was applied, the two local minima of the free energy came closer if we prepared for 
the proper initial conditions. ARA improved the probability of escaping the local 
minimum by quantum tunneling. This study shows that ARA can be useful for error 
correcting codes. 

In the practical case, we need to prepare for the initial candidate solution by 
using some algorithms. In the previous study [2] for CDMA multiuser detection, 
we utilized the approximate message passing algorithm [8] to prepare for the initial 
candidate solution. The performance of ARA in practical case was different from 
the oracle cases where the initial candidate solution was generated from the original 
signal. Evaluation of the performance of ARA in the practical case for Sourlas codes 
is an interesting future direction. 
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Appendix 1: Derivation of Eq. (13.17) 


In this appendix, we derive Eq. (13.15) in detail. We mainly follow the references [10, 
18, 19]. We consider the z basis as the computational basis. In this case, Tr is replaced 
by Уо ({a7}|()|{o%}) and |{o*}) = Gu. |с). For the z basis, we introduce M 
copies of the identity operator i= a (o *(£))) ((o*(1))] into Eq. (13.16), 


M M 
2м = Jim [] Е, ехр (-§ у> (so + (1 — s)(1 — оњ) 


1=1 (o*(1)) 1=1 


Е РЯ | 
x Цион ех (- ee) lo^ +1) (13.39) 
t=1 


where we introduce the periodic boundary condition |(0*(1)]) = |{o*(M + 1)}). To 
show the dependence of the spin operator on the Trotter index, arguments are added 
to each Hamiltonian in Eq. (13.39). For x basis, we similarly introduce the M copies 
of the identity operator і = 2 wol (o^ (0) ({o* (£))| into Eq. (13.39). The last 
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term in Eq. (13.39) can be written as 


j= =н | 
П 2 ap (- E ^ [T (fo (Ho? 00) (o7 (Ilo 7 + D). 


t=1 (o*(0] 
(13.40) 


Finally, we can obtain Eq. (13.17) in the main text as 


“ в Ва — 5) ҳи 
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where Tr denotes the summation over all the possible spin configurations {с} and 
(o; }. Since the first term in Eq. (13.41) consists of the commutable numbers, we can 
take the configuration average over the data distribution. 


Appendix 2: Derivation of the RS Free Energy 


We derive the free energy density under the RS ansatz and the static approximation. 
After the gauge transformation J;,.;, > Ji,.;, 5; --- &, and o, (t) > op (t)&j, we 
integrate over J;, 


П Гел nn " ip ирет] »- E „(®..- Ex 
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(13.42) 
where we use the expression аы, оў...о = (№ /р!) ози 19; им) + 


O (N?-!). We introduce the delta function and iis QUNM integral representation 
for Eqs. (13.20)-(13.23) as follows: 
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We assume the RS ansatz and the static approximation as 


malt) = m, qap(t, t') = q (a Б), Ralt, t) = R (t £t), m(t) m, 
Malt) = M, dat.) = д (as b), Ri) = R (t £t), WO = т^. 


Under the RS ansatz and the static approximation, eC! is represented as 
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We compute ef? under the RS ansatz and the static approximation as follows: 
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where 


gi, &) = m + (1—5)(1—5Ут& + dz - V R — dy, (13.54) 
£a = f+ a(1 - 0 — 3 Vaz t ау. (13.55) 


We apply the Hubbard-Stratonovich transformation, 


2 
exp (5) == / Dv; exp (xvi), (13.56) 


to the terms (B/G/M У, о T /2and У`, С [R-—G/MY, oj, o) /2. We 
now perform the inverse operation of the Suzuki—Trotter decomposition and take the 


trace. 
Under the RS ansatz and the static approximation, eC? is expressed as 
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In the thermodynamic limit N — oo, the saddle-point method can be used. The 
RS free energy density is then expressed as 
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The order parameters and their auxiliary parameters сап be determined from the 
saddle-point conditions. 
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Part V 
Applications 


Chapter 14 A) 
Structural and Functional Analysis get 
of Proteins Using Rigidity Theory 


Adnan Sljoka 


Abstract Over the past two decades, we have witnessed an unprecedented explo- 
sion in available biological data. In the age of big data, large biological datasets have 
created an urgent need for the development of bioinformatics methods and inno- 
vative fast algorithms. Bioinformatics tools can enable data-driven hypothesis and 
interpretation of complex biological data that can advance biological and medicinal 
knowledge discovery. Advances in structural biology and computational modelling 
have led to the characterization of atomistic structures of many biomolecular com- 
ponents of cells. Proteins in particular are the most fundamental biomolecules and 
the key constituent elements of all living organisms, as they are necessary for cellu- 
lar functions. Proteins play crucial roles in immunity, catalysis, metabolism and the 
majority of biological processes, and hence there is significant interest to understand 
how these macromolecules carry out their complex functions. The mechanical het- 
erogeneity of protein structures and a delicate mix of rigidity and flexibility, which 
dictates their dynamic nature, is linked to their highly diverse biological functions. 
Mathematical rigidity theory and related algorithms have opened up many exciting 
opportunities to accurately analyse protein dynamics and probe various biological 
enigmas at a molecular level. Importantly, rigidity theoretical algorithms and methods 
run in almost linear time complexity, which makes it suitable for high-throughput and 
big-data style analysis. In this chapter, we discuss the importance of protein flexibil- 
ity and dynamics and review concepts in mathematical rigidity theory for analysing 
stability and the dynamics of protein structures. We then review some recent break- 
through studies, where we designed rigidity theory methods to understand complex 
biological events, such as allosteric communication, large-scale analysis of immune 
system antibody proteins, the highly complex dynamics of intrinsically disordered 
proteins and the validation of Nuclear Magnetic Resonance (NMR) solved protein 
structures. 
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14.1 Introduction 


In the current post-genomics era, advances in experimental and computational tech- 
niques have revolutionized biological and biomedical research. High-throughput 
technologies have paved the way to novel research avenues where we can systemat- 
ically analyse whole genomes of organisms and individual or collection of proteins, 
including their structures and interactions with other proteins, which in many cases 
allow researchers to successfully decipher their biological functions. Proteins are 
macromolecules that are fundamental to most cellular function [1]. They comprise 
the highest levels of molecular and cellular structure and organization, and because 
the majority of physiological and disease processes are manifested within proteins, 
structural and computational biology research is focused on understanding protein 
function. 

Proteins and other biomolecules are nanomachines. Accurate representation of 
their three-dimensional structure is a critical first step to understanding how they per- 
form their functions. Advances in molecular biology, instrumentation, and imaging 
technologies such as X-ray crystallography, nuclear magnetic resonance (NMR), and 
electron microscopy have led to a revolution in structural biology. These techniques 
allow us to see beautiful yet complex three-dimensional shapes of protein structures 
and how they interact with other proteins and ligands. Protein imaging techniques are 
continuously improving, and for many proteins, we can now characterize their struc- 
tures at an individual-atom-level resolution. A rapidly growing and revolutionary 
cryogenic-electron microscopy (cryo-EM) technique has been attracting significant 
attention, as very recently it has broken various resolution barriers [2] and can now 
discern individual atoms of very large protein structures (see Fig. 14.1). Cryo-EM 
complements X-ray crystallography because it reveals atomistic structural details 
without the need for a crystalline specimen. Protein Data Bank (PDB), a repository of 
experimentally solved protein structures, together with computationally determined 
protein structures, make up a rich source of protein structural data. Recent advances 
in AI and deep learning have provided significant improvements in inferring protein 
structures from a sequence of amino acids [3]. Deepmind’s Alphafold method has 
demonstrated that deep learning structure predictions can come astonishingly close 
to experimentally determined structures, and in the near future, we expect this will 
result in huge growth of macromolecular structural data. The increasing richness of 
the available protein structural data and the rapidly growing proteomics and bioinfor- 
matics big-data repositories open up possibilities to systematically analyse complex 
biological questions and gain novel biological insights. To facilitate data-driven bio- 
logical knowledge discovery, many bioinformatics and computational biology tools, 
software packages, and databases have been developed [4]. 

Despite tremendous advances in bioinformatics, structural biology and imaging 
technologies which have generated hundreds of thousands of atomic snapshots of 
protein structures, many fundamental biological problems such as protein folding, 
allosteric regulation, receptor signalling, and enzyme catalysis, to name a few, still 
remain largely unresolved [5—12]. While the static high-quality representation of 
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Fig. 14.1 Cryo-EM 
snapshot structure of viral 
spike protein of 
SARS-CoV-2 (a key protein 
involved in COVID-19), 
which is a very large protein 
structure consisting of three 
chains (distinct colours), 
each consisting of nearly 
1300 amino acids 


protein structures can offer clues to structure-function mechanisms, protein func- 
tion is almost purely controlled by its dynamic character through a delicate mix of 
rigidity and flexibility. Research must move beyond static snapshot representations 
of proteins, as the mechanical heterogeneity of protein structures that dictates their 
dynamic nature is intimately linked to their highly diverse biological functions. Deep 
understanding of the connection between structures and internal protein flexibility, 
rigidity, and dynamics is absolutely critical, as it can lead to solutions to protein fold- 
ing problem, elusive allosteric regulation and other dynamically driven biological 
secrets of protein regulation. 

The primary desire of any protein researcher is to see proteins move in real time 
at the atomistic level while they carry out their biological functions. Yet, despite 
many advances in experimental techniques and molecular dynamics simulations, 
such a goal is still very far from being realized. Analysing and comprehending 
protein flexibility and dynamics has proven to be extremely difficult. One major 
challenge is that the main molecular simulation methods, such as classical molec- 
ular dynamics simulations, require a prohibitive amount of computational power 
and are not suitable to reach biologically relevant functional dynamics that occur 
on longer (millisecond-second) timescales. Furthermore, with rapid growth in the 
number of experimentally solved biomolecular structures and the increasing size of 
structural protein databases, including the expanding big-data size sets of computa- 
tionally predicted protein structures, we are faced with a pressing need to develop 
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fast algorithms and novel mathematical and computational techniques that simplify 
the classical force fields and can offer experimentally verified accurate predictions 
of protein flexibility and dynamics. 

Techniques inspired from the field of mathematical-structural rigidity theory [13— 
16] have gained special attention as they are suitable for handling the many challenges 
with computational analysis of protein flexibility and its dynamics. Biological func- 
tions of protein structures are often related to their network (or graph) properties. 
Mathematical rigidity theory offers considerable promise in deciphering graph the- 
oretical properties of protein networks to better understand protein function [13, 
15-18]. In rigidity theory, proteins are modelled as geometric molecular frameworks 
consisting of atoms and various connecting intermolecular forces. Such frameworks 
are essentially multigraphs (networks), in which atoms are vertices and edges form 
various bonding and non-bonding constraints (see Sect. 14.3). The programme FIRST 
[15] and related methods [19] apply mathematical results that provide combinato- 
rial characterization of rigidity and flexibility on a molecular multigraph, which 
can rapidly decompose a protein framework (i.e., multigraph) into flexible and rigid 
regions. Starting with a decomposition of a protein into rigid and flexible regions, fast 
Monte Carlo geometric simulation methods, such as FRODA and FRODAN [19-22], 
can sample the highly complex conformational space of proteins and simulate their 
functionally relevant motions. The main advantage of rigidity theory methods over 
classical molecular dynamics simulations is that their predictions of rigidity and flex- 
ibility are very fast, they are not affected by timescale issues (see Sect. 14.2), and they 
are suitable for high-throughput and big-data style analyses. Moreover, predictions 
based on rigidity theory have been widely shown to be consistent with experimental 
measures of protein flexibility and dynamics [11, 12, 15, 17—19, 22, 24—26]. 

In this chapter, we first discuss the importance of protein flexibility and dynamics 
for biological function (Sect. 14.2). We then provide a brief review of fundamental 
concepts in rigidity theory (Sect. 14.3) that enables us to perform fast predictions 
of flexibility and dynamics of protein structures. We next discuss how to represent 
biomolecules as a graph constraint network, the mathematical/algorithmic back- 
ground for analysing protein networks, and the basic uses of rigidity theory soft- 
ware for analysing protein flexibility and its dynamics. We then review some major 
advances contributed by the author of this chapter, in which rigidity theory and 
algorithms were used to elucidate and provide new perspectives on very complex 
biological phenomena, such as long-range allosteric communication, enzyme cataly- 
sis, antibody dynamics, and NMR structural validation (Sect. 14.4). We conclude by 
reviewing some of these recent developments and some surprising breakthroughs that 
have led to rich protein function discoveries that were mainly driven by mathematical 
rigidity theory. 
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14.2 Protein Structural Flexibility and Dynamics 


In this section, we briefly cover for non-biologists the background and the importance 
of predicting protein flexibility, which is arguably one of the most fundamental 
research topics in biochemistry, structural biology, and bioinformatics. 


14.2.1 Protein Flexibility and Dynamics Is Central to Protein 
Function 


Proteins are polypeptide chains composed of a linear sequence(s) of amino acids 
[1]. Through a complex protein folding process, forces are exerted on atoms which 
steer a polypeptide chain(s) into a defined three-dimensional biologically functional 
native-state structural ensemble. High-resolution X-ray crystallography and other 
techniques have revealed aesthetic structural complexity of protein structures and 
have revolutionized our understanding of their function, which have spearheaded 
the development of novel experimental and computational methods for examining 
protein function in atomistic detail. It is important to stress that solved protein struc- 
tures are only snapshots or pictures of proteins at some low-energy state. This can 
often provide a misleading representation of proteins and potentially misinform about 
their function, which must include kinetic and thermodynamic descriptions [5] (see 
Fig. 14.2). 

Proteins are composed of rigid and connecting flexible regions that can be highly 
dynamic, which facilitates sampling a wide variety of conformations spanning a com- 
plex multidimensional energy landscape. In this conformational biomolecular dance, 
proteins undergo dynamical fluctuations even under conditions that are preferentially 
biased towards a well-defined low-energy 'native' state [5]. Such dynamically driven 
conformational states and fluctuations are critical to long-range allosteric regula- 
tions, ligand recognition, catalytic efficiency, antibody-antigen recognition and the 
majority of functional mechanisms. Understanding protein flexibility and rigidity 
and how it is modified by mutations and ligand binding is critical to understanding 
and modulating protein function [5, 7, 8, 11, 12]. Most globular proteins (excluding 
intrinsically disordered proteins) function through utilizing a delicate mix of rigidity 
and flexibility. Achieving appropriate balance between rigidity and flexibility is one 
of the most important keys for biological function. Protein rigidity is necessary, as 
it maintains overall structural fold, while flexibility and dynamics enable proteins to 
perform specific functions. Protein defects can lead to alterations in overall folding, 
or they can cause proteins to be overly flexible, interfering with protein function, 
or cause other extreme defects that can result in indestructible rigid protein. These 
scenarios are related to numerous medical conditions, including neurological dis- 
orders, Alzheimer's disease, and Mad Cow disease [22, 27]. Hence, predicting and 
examining protein flexibility and dynamics is the most important, and probably the 


342 A. Sljoka 


Fig. 14.2 The structure of an enzyme (Protein Data Bank ID 2jz3) showing a protein snapshot 
representation and conformational ensemble depicting its dynamical characteristics b 


most complex, component of protein research. This is an active area of research in 
both experimental protein science and computational biology. 

Protein structures can have thousands of conformational degrees of freedom. It is 
therefore easy to imagine that their motions can be extremely complex, and deter- 
mining flexible and rigid regions and how they move relative to one another can seem 
like a daunting task. Moreover, many proteins are oligomeric structures consisting of 
two or more interacting polypeptide chains, and in some cases the structures are very 
large, consisting of thousands of amino acids (see Fig. 14.1). Protein flexibility and 
rigidity are often regulated by interactions with small ligands, drugs, hormones, and 
cations (e.g., calcium and magnesium) and changes in temperature, pressure, and 
pH [11, 15, 17, 18, 24]. Internal motion and conformational change can be rapid 
and transient and result in a structural ensemble that can often be spectroscopically 
indistinguishable from the snapshot ground state determined by X-ray crystallogra- 
phy or other imaging techniques (see Fig. 14.2). Protein dynamics occur across a wide 
range of timescales, from very rapid short-amplitude motions caused by bond vibra- 
tions occurring on a femtosecond range, to side-chain motions on the picosecond to 
nanosecond timescale, all the way up to very slow larger-amplitude collective domain 
motions, which are often biologically most significant, occurring in the milliseconds 
to seconds range [5] (see Fig. 14.3). Dynamics on longer timescales (1.е., millisec- 
ond to second timescales) are functionally very important because many biological 
processes—including allostery, enzyme catalysis, receptor activations, and protein— 
protein interactions—occur on such timescales [5, 9, 11, 12, 24, 28]. Fluctuations 
between different low-energy states and the heights of their energy barriers can also 
be affected by mutations, ligand binding, and changes in temperature or pH. The 
timescale component of protein dynamics is one critical factor that complicates the 
computation examination of protein dynamics. Another important characteristic of 
protein dynamics is the amplitude and directionality of conformational fluctuations 
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[5]. All these factors combine to contribute to the difficulty in obtaining knowledge 
about the flexibility and motion of proteins. 

Despite this complexity, functional motions will often involve large domain- 
domain motions (i.e., relative motions dominated by a few rigid bodies) and many 
degrees of freedom can be neglected or suppressed to study the functionally most 
important motions. Hence decomposition of a protein into rigid and flexible regions 
is a highly important aspect of deciphering protein dynamics. 


14.2.2 Techniques for Analysing and Predicting Protein 
Flexibility and Dynamics 


In terms of experimental techniques, NMR measurements such as order parameter 
measurements and chemical shifts are very useful in studying protein dynamics [24, 
29]. Mass spectrometry, hydrogen-deuterium exchange, crystallographic B-values, 
etc. can also provide deep insights into the dynamical nature of protein structures 
[5, 11, 24, 25]. Fluorescence resonance energy transfer (FRET) [30] measures in 
particular have high practical value as they can characterize changes in distance 
for single molecules over time as well as possible corresponding conformational 
changes. However, the disadvantage of FRET is that only a single distance change is 
measured. Experimental measurements are useful as they can be used to infer specific 
information about dynamics across a specific range of timescales (see Fig. 14.3) and 
are specifically very helpful in supporting and validating computational predictions. 
The disadvantage of experimental tools is their high cost, susceptibility to uncertainty 
in measurements, and frequent inability to provide information about very dynamic 
regions of protein structures. Moreover, protein structures often have to be stabilized 
to extract structural and dynamical information. Experimental measurements can 
also take a long time to perform, as they require maintenance of very expensive 
equipment; yet, such measurements can rarely provide dynamical information about 
individual atoms. 

Computationally, it should be theoretically possible to describe protein dynamics 
in their entirety. Molecular dynamics (MD) simulation has been the most widely 
used approach for simulating the motions of proteins and other biopolymers [28]. 
Molecular dynamics simulations of proteins have been a common tool in biochem- 
istry and biophysics since the 1970s [31]. It has been successfully applied to protein 
folding problems, the impact of protein motions on enzyme catalysis, and the effects 
of mutations and ligand binding on protein motions [28]. Its uses have increased in 
recent years, pointing to the key importance of deciphering the relationships between 
complex motions and protein function. In molecular dynamics simulations, the trajec- 
tories of individual atoms in protein structures can be predicted by repeated numerical 
solutions of the Newtonian motion equation (i.e., F — ma), with forward integra- 
tion in time, where F represents a force field (energy function). A force field models 
all potential forces and energies between the molecules and is supposed to be a 
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Fig. 14.3 a A one-dimensional cross-sectional representation of a high-dimensional protein’s 
energy landscape. Proteins can be defined as multiple collections of low-energy conformational 
states (defined as minima in the energy surface), with many conformational ensemble substates 
interconverting between one another on very fast timescales. The time it takes a protein to transition 
from one low-energy state to another is dependent on the height of the energy barrier between the 
states. When the barrier is high, this can occur in a relatively long microsecond to second range. b 
Timescales of different dynamic processes in proteins and different experimental methods that can 
detect fluctuations on each timescale. Longer timescales are largely inaccessible to classical MD 
simulations. However, rigidity theory methods and simulations are not confined by this timescale 
issue. Figure adapted from [5] 
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simple parameterization of the energy surface of the protein. A number of different 
methods and force field models exist for parametrizing the potential energy surface. 
Assuming one can use an accurate description of a force field, a difficult and heavily 
debated concept, molecular dynamics simulation can be extremely useful in tracking 
the precise position of atoms over time. However, the major downside of molecu- 
lar dynamics simulations is that they require prohibitively excessive computational 
power. Indeed, even despite today’s computational advances and special-purpose 
simulation machines [32], in the majority of cases molecular dynamics simulations 
are largely impractical for investigating biologically relevant protein motions on rela- 
tively long microsecond timescales. Stemming from the increase in protein structural 
data combined with the increasing size of solved structures, advances in emerging 
Cryo-EM technology and deep learning, it is clear that there is an urgent need to 
develop alternate efficient and accurate computational methods for molecular flexi- 
bility and dynamics simulations. 

A large class of computational approaches that simplify classical force fields have 
been developed. Coarse-grained simulations, normal model analysis, principal com- 
ponent analysis, contact network analysis, and other related methods have become 
popular alternative approaches to classical MD simulations [33]. In coarse-grained 
and network approaches, physical units such as individual amino acids or a cluster of 
amino acids including rigid clusters can be treated as nodes (vertices), where edges 
indicate possible interactions or contacts. For more precise modelling, individual 
atoms should be treated as vertices and edges should model pairwise bonded and 
non-bonded contacts. 

Arguably, one of the most powerful ways of analysing the flexibility and rigidity 
of protein structures, especially using an all atom representation, is based on math- 
ematical rigidity theory [13—16, 19, 34]. Rigorous mathematical results in rigidity 
theory, whose details are explained below, can be used in combination with fast algo- 
rithms to rapidly decompose a protein constraint graph into rigid and flexible regions. 
Moreover, how rigidity is modified through protein-protein, protein-ligand, or other 
interactions can be quickly predicted. Such decompositions are very informative as 
they can be combined with other methods such as MD simulations, normal mode 
analysis, or Monte Carlo simulations [19, 22] to directly infer information about pro- 
tein dynamics. This is discussed in more detail below. We now turn the discussion to 
mathematical formulations and the uses of rigidity theory for the analysis of protein 
structures. 


143 Rigidity Theory 


In this section, we present a basic introduction and results of rigidity theory that are 
essential for applications to protein structure and function analysis, with a focus on 
combinatorial rigidity theory concepts. For a thorough review of rigidity theory see 
[13, 19, 34]. 
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14.3.1 Combinatorial Rigidity Theory and the Molecular 
Theorem 


In general terms, flexibility is the ability of a material or framework to reversibly 
change the configuration of its joints, bodies, or building blocks. Rigidity, which is 
the opposite property of flexibility, describes a state in which no relative motions 
are allowed between the framework’s elements. In a rigid structure, only rigid body 
motions are possible (i.e., motions arising from congruences of space, rotations, 
translations, etc.). In biochemistry and biophysics, a notion related to rigidity is 
the concept of stability and robustness, where internal protein dynamics are not 
changed in response to small atomic fluctuations and the breaking of a few non- 
covalent interactions. Although to a non-expert, rigidity and stability may seem like 
related concepts, care should be taken to understand the potential differences and 
their implications. 

Mathematical rigidity theory, sometimes called structural rigidity because of its 
close connections to structural and mechanical engineering, offers the most math- 
ematically sound concepts and algorithms for analysis of rigidity and flexibility of 
frameworks [13, 14, 34]. Rigidity theory analyses the rigidity and flexibility of frame- 
works, as specified by geometric constraints such as fixed distances, directions, and 
volumes defined by a collection of points, lines, planes, or rigid bodies. Frameworks 
can be natural structures (molecules, crystals, proteins, etc.) or engineered structures 
(bridges, robots, etc.), and because rigidity is an essential property of most frame- 
works and materials, rigidity theory naturally has many applications in engineering, 
robotics, material science, and biology. 

Rigidity theory has both geometric and combinatorial characteristics relying on 
techniques in linear algebra, discrete and algebraic geometry, graph theory, and com- 
binatorics. Rigidity theory has a very long and rich history in mathematics, with early 
work appearing in the form of Euler’s (1766) conjectures on rigidity of polyhedra. 
Maxwell’s (1864) [14, 34] work on counting constraints in a framework for generic 
rigidity led to the birth of so-called ‘combinatorial rigidity’. Combinatorial charac- 
terization of rigidity theory, 140 years later, has turned out to be absolutely crucial for 
rapid flexibility analysis of materials such as glass networks and protein structures 
[14]. 

The classical and simplest frameworks studied in rigidity theory are the bar and 
joint frameworks (see Fig. 14.4), which are composed of universal (rotating) joints 
that are connected by bars that fix the distances between pairs of joints. A bar and 
joint framework is defined as a pair (G, p), where G = (V, E) is an undirected 
graph and р: V — 4, where vertices correspond to joints and edges correspond 
to bars that connect some pairs of joints; p represents a configuration of joints in 
R¢. A framework (G, p) in R? is rigid if the only edge-length-preserving continuous 
motions of the vertices are derived from isometries of В. If d > 2, it is NP-hard 
to determine if a bar and joint framework is rigid [34]. As determining the rigidity 
of frameworks is very difficult, a common approach is to linearize the problem by 
differentiating the length/bar constraints of the corresponding pair of connecting 


14 Structural and Functional Analysis of Proteins Using Rigidity Theory 347 


Fig. 14.4 Bar and joint framework examples: a is flexible as it can deform its shape (note it is one 
edge too short in terms of Laman’s count, |E| < 2|V| — 3); b is minimally rigid in 2D (but flexible 
in 3D as one can rotate two triangles around the diagonal). c is redundantly rigid in 2D as it has a 
redundant (i.e., extra) edge and is minimally rigid in 3D 


points/joints, which leads to a system of linear equations (one equation per edge) 
and a corresponding rigidity matrix. The solution to such a homogenous system 
can be captured by calculating the rank of the rigidity matrix, which indicates if a 
framework is infinitesimally rigid [34, 35]. However, in many applications and large 
frameworks such as proteins, this is not particularly practical owing to numerical 
errors and uncertainty in rank computations of the rigidity matrix. 

A well-known fact within rigidity theory is that if the framework is generic (1.е., 
it does not have special singular geometry), then rigidity and infinitesimal rigidity 
coincide [34]. Generic frameworks are very important, as rigidity can be studied 
by pure graph and combinatorial techniques—a subfield of rigidity theory called 
combinatorial rigidity theory. A framework is generically rigid if it maintains rigidity 
even after minor changes to the position of its joints, and almost all frameworks are 
generic [13, 34, 36]. By assuming that a framework is in a generic position, one can 
neglect the geometric embedding of joints and actual distances of bars to focus on 
only the topology of the bar and joint framework and discuss the generic rigidity of 
(G, p) in terms of graph G. 


14.3.1.1 Counting for Rigidity and Flexibility 


We now motivate the characterization of rigidity of generic frameworks using com- 
binatorial arguments. For bar and joint frameworks in dimension d, each joint (point, 
vertex) has d conformational degrees; hence, N joints have a total of dN degrees 
of freedom. The number of trivial rigid body motions in dimension d or isometries 
is d(d + 1)/2. Therefore, in a generic rigid bar and joint framework, the number of 
bars > dN — d(d + 1)/2. This is known as Maxwell’s counting condition. In the 
plane (d = 2), Laman’s theorem [34] extends this result by proving that the 2N — 3 
count is both necessary and sufficient for generic rigidity of two-dimensional bar 
and joint frameworks. More formally, a two-dimensional bar and joint framework 
is generically minimally rigidity if and only if | Е| = 2| N| — 3 and, for all subsets 
of edges, | E'| x 2|N’| — 3. In other words, this remarkable theorem says one can 
count the vertices and edges in a graph and their distributions over subgraphs to 
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Fig. 14.5 Maxwell’s counts 
in 3D do not guarantee 
rigidity. A bar and joint 
framework in 3D (known as 
the double banana graph) 
satisfies the 3| V| — 6 count 
condition but is flexible (two 
yellow rigid subgraphs can 
rotate about an imaginary 
hinge shown as a red dashed 
line) 


predict generic rigidity of two-dimensional bar and joint frameworks. A framework 
is minimally rigid if removal of any edge (bar) results in a flexible framework (see 
Fig. 14.4). 

Unfortunately, Maxwell’s counting results are not sufficient for minimally rigid 
bar and joint graphs in dimension 3 and higher. For example, a well-known coun- 
terexample is a graph of a double banana, which satisfies Maxwell's 3| N | — 6 count 
but is flexible (see Fig. 14.5). Not only is there a lack of a Laman type of a theorem 
for generic bar and joint frameworks in dimension 3 and higher, there are no known 
polynomial time algorithms for testing rigidity for general three-dimensional graphs 
[34]. Extensive research has been conducted on this problem and, to date, only some 
partial results and approximation algorithms can be found [34, 35]. Fortunately, 
for different classes of frameworks, called body-bar and body-hinge frameworks, 
which includes molecular frameworks, there is a complete and rich combinatorial 
characterization of rigidity, which is discussed next. 


14.3.1.2 Rigidity Model of Molecules and the Molecular Theorem 


To build a computational method based on rigidity theory that can provide fast and 
accurate prediction of protein rigidity and flexibility, three requirements must be met: 
(i) a realistic physical model of a basic molecular framework; (ii) an accurate model 
of molecular interactions; and (iii) a fast algorithm for predicting rigidity/flexibility 
properties of the protein framework model. 

Protein structures consist of atoms and various chemical interactions (forces) of 
different strengths. In rigidity theory, strong interactions between atoms are usually 
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assumed to be fixed rigid constraints in terms of distances and angles. In such arigidity 
model of a molecule, bonding interactions are assumed to fix distances between a pair 
of bonded atoms, and the angles between the bonds of an atom are fixed, allowing 
only dihedral angle rotations. High frequency motions such as bond vibrations are 
neglected. This is a sensible modelling assumption as single covalent bond lengths 
are essentially invariant. For example, the length of a covalent bond between two 
carbon atoms will vary less than a single percent from its equilibrium value of 1.53 
angstroms [14]. Double bonds and peptide bonds lock dihedral angles, and non- 
covalent interactions such as hydrogen bonds and hydrophobic contacts also impose 
additional constraints. 

A molecular framework in rigidity theory is a collection of atoms, which can be 
modelled as fully rigid bodies with six conformational degrees of freedom of a rigid 
body and bonds as rotatable hinges, which allow for rotational degrees of freedom 
between single-bonded atoms. Such frameworks in rigidity theory are a special case 
of body-hinge framework. Hinges (i.e., bonds) remove five degrees of freedom, and 
for algorithmic and theoretical reasons, it is useful to model hinges as a set of five 
rigid bars, where each bar (i.e., edge) generically removes a single degree of freedom 
between bonded atoms. This finally leads to a body-bar framework representation of 
a molecular body-hinge framework—that is, a collection of rigid bodies connected by 
linear bars. Special geometric criteria should be considered as bonds are not generic 
hinges (since bonds intersect at centre of atoms) and the five bars have to pass through 
the hinge axis to geometrically give the same model as a hinge, but such discussion 
is beyond the scope of this chapter (details can be found elsewhere; see [13]). Double 
bonds are modelled as a set of six bars between two atoms. Moreover, non-covalent 
interactions such as hydrogen bonds and hydrophobic interactions, which are impor- 
tant for overall protein structure folding and rigidity, can also be modelled as a set 
of one to five bars (where one bar indicates the bond is least restricting and five bars 
indicate it is most restricting) [25]. This overall model, consisting of rigid bodies for 
atoms and both covalent bonds and non-covalent interactions, defines the body-bar 
framework model of a protein structure (see Fig. 14.6). 

The topological structure of a body-bar (and body-hinge and molecular body- 
hinge) framework is a multigraph G = (V, E). Vertex set V corresponds to a set 
of bodies (i.e., atoms) and edge set E to a set of bars (i.e., bond constraints). In 
accordance with Laman's theorem, an equivalent statement for body-bar frameworks 
was formulated by Tay [37]. Tay's theorem confirms that the rigidity of generic body- 
bar frameworks in 3D (which works for all dimensions) can be checked using the 
6|V| — 6 count in a body-bar multigraph. Tay's theorem also extends to generic 
body-hinge structures [20]. It was proven by Katoh and Tanigawa [38] that the same 
counting condition stated in Tay's theorem also characterizes the rigidity of generic 
molecular body-hinge frameworks. This result is known as the molecular theorem, 
which is here combined with Tay's theorem into one statement. 


Theorem 1 (Tay's Theorem/Molecular Theorem) A generic three-dimensional 
body-bar framework (body-hinge/molecular framework where bonds (hinges) are 
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Fig. 14.6 a 3D body-hinge framework composed of seven rigid bodies connected by hinges (lines) 
can be modelled as a body-bar framework (with a corresponding body-bar multigraph shown). 
b A molecule consisting of two carbon atoms and a single bond can be viewed as a body-hinge 
structure where atoms are rigid bodies (one-valent hydrogen atoms are a part of a carbon atom 
rigid body, as their angles are fixed and can only spin around their axes) and a hinge is a rotatable 
bond, with corresponding body-bar multigraph. с A ring of seven carbon atoms (ignoring one-valent 
hydrogens) with a corresponding multigraph. (According to the molecular theorem a ring of seven 
atoms will have one internal degree of freedom. The total number of edges is 7(5) = 35, while 
we need 6|7| — 6 = 36). d Protein structure can be modelled as a molecular body-bar multigraph 
with black, red, and green lines corresponding to covalent bonds, hydrogen bonds, and hydrophobic 
contacts, respectively 


replaced by five bars) on a multigraph G = (V, E) is minimally rigid if and only if 
|E| = 6|V| — 6, and for all subsets of edges, |E'| < 6|V'| — 6. 


In the stated original form, Tay's theorem leads to an exponential algorithm, as it 
requires counting the number of edges in every subgraph. However, because these 
counts of G (same as Laman's counts) define an independent set in a matroid [13, 35], 
this gives rise to greedy algorithms that can be used to efficiently track these counts. 
It is well known that all matroidal structures have greedy algorithms. A number 
of fast polynomial algorithms based on matroid unions, tree decompositions, and 
extension of bipartite matching algorithms, such as the pebble game algorithm, were 
subsequently developed for tracking these rigidity certifying counts (independence) 
in graph and subgraphs [16, 39]. 


14.3.1.3 Pebble Game Algorithm 


The pebble game algorithm can very rapidly decompose a body-bar/molecular graph 
(1.е., protein structure) into rigid and flexible regions and quantify the overall number 
of degrees of freedom. The main step of the pebble game algorithm is to determine if 
a constraint (edge) is "independent (1.е., removes degrees of freedom) or is ’redun- 
dant' as its insertion has no effect on rigidity. The algorithm iteratively builds a 
maximal independent set of edges. We give a basic procedure of how the main steps 
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of the pebble game algorithm are carried out for Tay’s theorem without full details 
or speedups, which can be found in previous publications [16, 39]. A similar proce- 
dure can be derived for Laman’s counts or other matroidal independence counting 
conditions. The implementation of the pebble game algorithm routine given here, 
which tracks counts in the molecular theorem, is important for the protein flexibility 
analysis that has been implemented in several software packages. such as FIRST (see 
below). 


The Pebble Game Algorithm 6|V | — 6: 

Input: A multigraph G — (V, E). 

Initialize 7 (G) and A(G) to an empty set of edges. Place six pebbles on each vertex of G. (Fig. 14.6a) 
Test the edges of E in an arbitrary order. 


1. Until every edge in G has been tested, take any untested edge e, and go to step 2. Otherwise go 
to step 3. 
2. Count the number of free pebbles on the endvertices of e, say vertex u and v. 


(a) If the vertices u and v have at least seven free pebbles, then place any pebble from either 
u or v onto e, directing the edge e from that vertex (Fig. 14.6b). Place e into /(С) 
(independent edges) and return to step 1. 

(b) Else, search for a free pebble from u and v, by following the directed edges (covered 
edges) in the partially constructed directed graph 7 (G) (Fig. 14.6c). 

(i) If the free pebble is found on some vertex w at the end of the directed path P (which 
starts at u or v), we perform a swap or sequence of swaps (cascade), reversing the 
entire path P, until a free pebble appears on the initial vertex (u or v) of the path 
P (i.e., w loses one free pebble, and u or v gains one free pebble) (Fig. 14.6c—e). 
Return to Step 2. 

(ii) Else, we could not find the seventh free pebble, and the edge is declared redundant 
(could not be covered by the pebble) (Fig. 14.7). Place e into (С) (redundant 
edges). Return to step 2. 


3. Once all edges have been tested, stop. 
Output: The sets 7 (G) and (С) = E — I(G). 


When the algorithm is finished, /(G) is the maximal independent set of edges (edges that are 
covered by pebbles). (С) is the set of redundant edges (edges that were not covered by a pebble). 
Total degrees of freedom (DOF) in a graph — number of remaining free pebbles. 


The pebble game algorithm described here tracks the independence of edges in 
graphs prescribed by the molecular theorem. The initialization of placing six free 
pebbles on each vertex (corresponds to six trivial rigid body motions) tracks the 6|V | 
part of the count. Pebbles are synonymous with degrees of freedom and removal of a 
pebble indicates the inserted constraint (edge) is independent. Redundant constraints 
do not remove degrees of freedom (pebbles) as their insertion (or deletion) from an 
already rigid region causes no change in rigidity. Every time an edge is pebbled, it 
grows the set of independent edges. Pebble game algorithms are building a maximal 
subsets that are independent; at every stage, the edges covered by pebbles will satisfy 
|E'| x 6|V’| — боп all subsets. The requirement of at least seven free pebbles on the 
vertices before an edge is pebbled (i.e., declared independent) ensures the critical 
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Fig. 14.7 A demonstration of a 6| V | — 6 pebble game algorithm on a 3D cyclohexane graph. Edges 
are pebbled one by one (when there is at least seven free pebbles on its end vertices (a, b). If we 
cannot locate seven free pebbles we can search for free pebbles along with the partially created 
directed graph, swapping pebbles back. The graph has six remaining free pebbles and all edges are 
pebbled, indicating it is minimally rigid 


subtraction in 6|V | — 6 is respected on all subsets of edges. The algorithm is greedy. 
In other words, regardless of the order the edges are pebbled (i.e., are tested for 
independence), the algorithm will always give unique answers for total remaining 
free pebbles, the size of maximal independent / (G) and redundant R(G) set of edges. 
The pebble game algorithm is a very intuitive algorithm, which in the worst case runs 
in O(V?) [39], and in practice, it runs in linear time [15] (Fig. 14.8). 

There are many extensions one can extract from the pebble game [16]. For exam- 
ple, when we cannot locate the seventh free pebble, the failed search over the directed 
graph indicates a rigid cluster. By using this procedure, it is possible to find all the 
maximal rigid clusters and redundantly rigid clusters (Fig. 14.7). Prediction of a 
highly redundant rigid clusters provides useful importance to a biochemist as these 
regions will have additional robustness, and will not become unstable (flexible) due 
to one or few edges breaking. For example, when a hydrogen bond breaks in a sig- 
nificantly redundantly rigid region, it will not alter its rigidity. We can also extract 
the relative degree of freedom count for any subgraph in G. This is very useful in 
the prediction of flexibility of particular regions of interest in protein graphs, for 
example, in antibody protein flexibility studies and in allostery predictions, which is 
discussed in next section. 
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Fig. 14.8 6|V| — 6 Pebble game algorithm. a When we cannot pebble an edge, it indicates that 
edge is redundant and the corresponding failed search locates a redundantly rigid subgraph (b). 
Overall, the graph is flexible with one internal degree of freedom, as indicated by the remaining 
seven free pebbles. Rigid clusters are circled. Each one of the bonds can be moved with one internal 
rotational degree of freedom 


14.4 Protein Flexibility, Dynamics, and Function Analysis 
with Rigidity Theory 


14.4.1 FIRST and Rigid Cluster Decomposition 


The pebble game algorithm is the main component of the programme FIRST [15] and 
other related software for analysing protein rigidity and flexibility. Starting with a 
protein structure (experimentally or computationally determined structure) in Protein 
Data Bank File format, the programme FIRST begins by creating a molecular body- 
bar multigraph. The multigraph consists of all atoms (including hydrogen atoms) 
represented by vertices, with covalent bonds, hydrogen bonds, hydrophobic contacts, 
and electrostatic interactions represented by edges. Covalent bonds are modelled as 
five edges, with six edges for double bonds and peptide bonds (as they do not have 
bond rotation), while hydrogen bonds and hydrophobic interactions are modelled 
with between one and five edges [25]. Hydrophobic contacts are defined as a pair of 
carbon-carbon, carbon-sulfer, or sulfer-sulfer atoms in close contact. Each hydrogen 
bond is assigned an energy strength in kcal/mol using an energy potential based on 
hydrogen donor and acceptor geometries. Hydrogen bonds are very important to 
the overall protein shape and stability. A hydrogen bond cutoff energy value (which 
mimics temperature) is selected such that all bonds weaker than this cutoff are ignored 
in the graph. Once the final constraint multigraph is obtained (Fig. 14.6d), FIRST then 
uses the pebble game algorithm and molecular theorem to decompose the protein 
into rigid and flexible regions. 
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Fig. 14.9 Rigidity and flexibility analysis using FIRST and the pebble game algorithm on protein 
data from the Protein Data Bank (Protein Data Bank ID, 2jz3). The hydrogen bond dilution plot 
indicates how the protein breaks down as the hydrogen bond cutoff is increased (1.е., energy is 
increased), breaking hydrogen bonds one by one. Flexible regions are indicated by thin black lines 
and rigid regions are indicated by blocks, with separate colours indicating distinct rigid clusters. 
Flexible regions are coloured black on the protein structure. Initially, with inclusion of all potential 
hydrogen bonds, the protein is dominated by a few large rigid clusters (indicated by separate colours), 
and as hydrogen bonds are gradually broken with increasing energy, most of the protein becomes 
flexible (black) with a few remaining rigid clusters 


Figures 14.9 and 14.10 show some examples of rigid cluster decompositions 
obtained with FIRST and the pebble game algorithm for two proteins. The rigid 
cluster decomposition on a very large Spike protein complex consisting of nearly 
4000 residues was obtained in less than one second of running time (Fig. 14.10) We 
can monitor gradual changes in the rigid cluster decomposition as hydrogen bonds 
are removed one by one (i.e., by lowering the hydrogen bond energy threshold) in 
the order of increasing bond strength. The change in rigidity can be visualized using 
a hydrogen bond ‘dilution plot’ (Fig. 14.9). Because the pebble game is a combinato- 
rial integer algorithm (tracking molecular theorem counts) as opposed to a numeric 
algorithm, FIRST always gives a unique exact answer. 

While tremendous computational power and resources are needed to simulate pro- 
tein flexibility with MD simulations, FIRST can predict rigid clusters and flexible 
connections in less than one second on a typical PC/laptop. Because of its speed and 
efficiency, rigidity theory analysis using FIRST and other related programmes have 
been widely applied to analysing various aspects of protein function and flexibility 
analysis, such as viral capsids [40] (with enormous structures containing hundreds 
of copies of protein structures), protein engineering, and prediction and replica- 
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tion of experimental measures of dynamics such as hydrogen-deuterium exchange, 
allostery, and enzyme catalysis [11, 12, 15, 17—19, 23, 24, 26]. 


14.4.2 Large-Scale Rigidity and Flexibility Analysis 


As an illustration of the efficiency and wider applicability of rigidity theory for large 
big-data high-throughput analyses of protein structures, we review a study where the 
author and colleagues carried out the largest study to date of flexibility predictions 
of antibody protein structures [41]. 

Antibodies are proteins produced by B cells that play a main role in the adaptive 
immune system. They recognize a variety of pathogens and induce further immune 
response to protect the organism from external disturbance. Molecules that are bound 
by antibodies are called antigens. The focus of this study was to characterize flexibil- 
ity of the key hyper-variable binding region on antibody called CDR H3 loop, which 
is the most important region in binding and recognition of various antigens. More 
specifically, we analysed whether the conformational flexibility of CDR H3 loop is 
changed as antibodies undergo affinity maturation. Antibodies can rapidly evolve 


356 A. Sljoka 


H3 Loop Flexibility 


—— Mature, N«900 
— Naive, №1011 


-6 -4 -2 
Hydrogen-Bonding Energy Cutoff (kcal/mol) 


Fig. 14.11 Antibody is a large Y-shaped molecule. CDR H3 loop (shown in red) is located on the 
surface of each antibody arm, acting as a key region for antigen binding and recognition. In the 
study, authors applied extensions of the pebble game algorithm to analyse flexibility of the H3 loop 
using thousands of naïve and mature structures. There was no significant difference in flexibility 
between the naïve and mature H3 loops (figure on right adapted from [41]) 


to specific antigens, where affinity maturation drives this evolution through multiple 
cycles of mutation leading to enhanced antibody specificity and affinity. In this study, 
we utilized various extensions of the pebble game algorithm, initially developed in 
[16], which enables quantification of local flexibility of any subgraph, with focus 
on CDR НЗ regions. By analysing thousands of mature and naA ve antibody crystal 
structure and homology models, we found no clear statistically significant differ- 
ence in the flexibility of CDR H3 loops (Fig. 14.11), which was also correlated with 
experimental measures of flexibility. Such large-scale analysis of the flexibility of 
protein structures could be carried out because of the speed of the underlying FIRST 
method and our various pebble game extensions. 


14.4.3 Protein Allostery Analysis with Rigidity Theory 


We now briefly discuss and review an important application of rigidity theory for 
analysis of allosteric signalling in protein structures. Allostery is one of the most 
powerful and fundamental mechanisms regulating protein function [8-12, 42—44]. 
Allostery refers to the regulation of protein function at a distance, where a pertur- 
bation of a protein structure at one part of protein structure (for example, due to 
a binding or mutational event) can affect conformations and dynamics at another 
distant site, resulting in regulation of protein function. Allostery is a common event 
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in the cell, and most dynamic protein exhibit some form of allosteric control mech- 
anism. Allostery has been referred to as ‘the second secret of life’, second only to 
the genetic code [8]. Monod and Jacob in 1960s [43] first introduced the allostery 
concept; however, most questions pertaining to allostery are still largely unresolved. 
Decoding the allosteric mechanism remains one of the key long-standing unsolved 
problems in the biological sciences. 

One of the important areas in allostery research is describing the physical mech- 
anism of distant coupled conformational changes. The utilization and extension of 
our earlier fundamental work in modelling allostery in frameworks and graphs [16] 
and a first rigidity-based mechanistic model of allosteric signalling has led to several 
important breakthroughs in understanding how allostery controls enzyme and recep- 
tor function [11, 12, 24, 44]. Our rigidity theory methods predict that if mechanical 
perturbation of rigidity at one site of the protein can transmit and propagate across a 
protein structure and, in turn, cause a change in the available conformational degrees 
of freedom and a change in the conformation and dynamics at a second distant site, 
resulting in allosteric transmission (Fig. 14.12a). Using various extensions of the 
pebble game algorithm, we can analyse how long-range conformational coupling 
occurs in protein structures, map out allosteric pathways (regions in protein that are 
important for allosteric signalling) and extract various other properties and features 
of long-range coupling. 

A popular hypothesis is that dynamical effects play a central role in enzyme 
catalysis. Dynamical changes are often manifested in proteins through allosteric 
effects, where a substrate binding can cause changes in dynamics at remote parts 
of a protein. In a study published in Science [11] concerning bacterial homodimeric 
fluoroacetate dehalogenase enzyme, experimental NMR chemical shift data sug- 
gested that when a substrate binds to one monomer, the second empty monomer 
undergoes asymmetrically pronounced conformational changes through an increase 
in flexibility in dynamics, thereby entropically favouring the forward reaction. Our 
rigidity-based allostery theory was able to verify this and elucidate in great detail the 
key residues involved in the allosteric pathways responsible for changes in dynamics 
and how substrate binding enhances allosteric communication between two subunits 
(Fig. 14.12b). These findings also provided deep insights into the energetic nature of 
allosteric processes that drive catalysis. 

In a follow-up study [24], we showed that when there is a high concentration of 
substrate, the enzyme undergoes catalysis inhibition through the reduction in dynam- 
ics and dampening of interprotomer allosteric effects. Our computational rigidity 
predictions of allosteric networks and resulting changes in dynamics when addi- 
tional substrates were bound to the enzyme were validated with NMR and functional 
experimental studies. These studies represented a major breakthrough in illustrating 
the role of dynamics and allostery in enzyme function. 

Our rigidity-theoretical approaches have been extremely useful for studying 
allostery in other enzymes and proteins. Indeed, we were able to provide a major 
advancement and new level of insight regarding key allosteric processes in GPCR 
activation. GPCRs are situated in the plasma membrane, engage the G-protein and 
initiate cell signalling [45]. In several studies [12], we have shown how interactions 
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Fig. 14.12 Rigidity theoretical model for allosteric communication. a Conformational changes in 
one region of the framework (or protein structures) can propagate and change conformations and 
rigidity at distant regions. b Rigidity theory allostery analysis showed that homodimeric fluoroac- 
etate dehalogenase enzyme with substrate fluoroacetate molecule (shown as orange spheres) exhibits 
allosteric communication between the two subunits (shown in distinct colours), which is critical for 
enzyme catalysis [11, 24]. с In a study of human adenosine A2A receptor [12, 18]., a member of 
superfamily of receptors called G-protein-coupled receptors (GPCRs) a similar approach was used 
to discover that allosteric communication between receptors and different domains of G-protein is 
critical for full receptor activation 


between GPCR and its natural G-protein binding partner affect activation networks, 
as is critical for optimal GPCR activation (Fig. 14.12c), or how sodium, calcium, 
and magnesium can affect this activation process [18]. Our rigidity theory-based 
approaches offer a new perspective and opportunity to study the various facets of 
allosteric regulation of protein function, which will allow us to examine complicated 
signalling events in the cell. 


14.4.4 Using Rigidity Theory to Simulate Protein Dynamics 


So far, the discussion has focused on infinitesimal flexibility (which is equivalent 
to finite flexibility, assuming atom positions are in a generic configuration) and not 
on continuous motions. In other words, FIRST and the pebble game outputs do 
not simulate protein dynamics and indicate the amplitude of motions. One useful 
extension is to combine the rigid cluster decomposition with Monte Carlo-based 
geometric dynamics simulations [20, 21]. Rigid cluster decomposition can remove 
hundreds of degrees of freedom from the overall protein framework and serve as a 
natural coarse graining step to speed up protein dynamics simulations [19, 46]. For 
example, the all-atom geometric simulation method FRODA (Framework Rigidity 
Optimized Dynamic Algorithm) (which runs about 100,000 times faster than MD 
simulations) [20] uses rigid clusters as a preprocessing step to explore the conforma- 
tional space of the protein motions. The rigid clusters, whose size and number depend 
on the selected energy threshold and the type of protein structure being analysed, can 
be kept fixed as rigid body geometrical components in the simulation motion (see 
Fig. 14.13). The atoms belonging to a rigid cluster can only move by utilizing trivial 
rigid body degrees of freedom. With this in mind, simulations can be focused on sim- 
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Fig. 14.13 Geometric simulation as used in FRODA/FRODAN (a). A part of a 2D slice through 
the 3N-dimensional conformational space, where red indicates disallowed states and blue indicates 
allowed states [21]. A random move (green arrows) is accepted if it falls within a blue region 
(green dots) and rejected if it falls within a red region (yellow dots), followed by enforcement of 
the constraints (yellow arrows). The black path produces a valid geometric path within the allowed 
conformational space. Any rigid region (which can be potentially very large) identified with FIRST 
moves as a single rigid body within FRODA or very small rigid clusters or individual atoms within 
FRODAN. b FRODA was applied to a large antibody protein to explore the large-scale motions of 
arms (green and orange) of the Y-shaped antibody structure, where three distinct colours represent 
three separate large rigid bodies. c FRODAN dynamics simulation illustrating internal dynamics of 
a Spike protein [47] 


ulating the relevant degrees of freedom belonging to intermediate flexible regions. 
FRODA rapidly generates geometrically valid conformations that are consistent with 
bond lengths and angular constraints while maintaining all rigid clusters. In these 
protein motion simulations, we need to add the van der Waals collisions of atoms as 
constraints, where only allowed geometries (valid stereochemistry, bonding angles, 
Ramachandran plots etc.) accessible to protein motions are simulated. Figure 14.13b 
shows the output of FRODA for an antibody protein, which exemplifies large ampli- 
tude motions. 

We have applied and extended FRODA, using the related constrained geomet- 
ric simulation programme FRODAN [21], which, like FRODA, provides very fast 
motion simulations but is better suited for proteins that are not dominated by large 
rigid clusters. In a FRODAN simulation, the rigid clusters are typically small, from 
single atoms up to small rigid cycles (e.g., proline rings and rigid loops). This makes 
FRODAN useful for simulations of protein motions that include substantial unfold- 
ing and refolding and analysing motions of intrinsically disordered proteins. Indeed, 
we have utilized a similar approach in combination with an experimental measure 
of dynamics, hydrogen-deuterium exchange, to characterize the highly complex 
motions and conformational ensemble of a large intrinsically disordered Tau protein 
[22]. Tau protein is a key protein in a number of pathologies and dementias such as 
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Fig. 14.14 a Tau protein is a large intrinsically disordered protein. Because of its high flexibility 
and disordered structure, it is able to take a wide variety of shapes, which makes it difficult to study 
with conventional MD simulations. b By performing large rigidity theory geometric simulations 
using FRODAN and its extensions, we were able to characterize the representative structures for the 
native and defective (1.е., hyperphosphorylated) forms of Tau, which was shown to be in agreement 
with HDX experimental data (The figure in b is adapted from [22]) 


Alzheimer's disease, and its primary physiological role is to stabilize microtubules 
in neuronal axons at all stages of development. One of the main challenges in under- 
standing the Tau structure—function relationship and finding successful therapeutics 
for Alzheimer’s disease is the poor understanding of the atomic structural ensemble 
and dynamics of the Tau protein. Moreover, Tau protein undergoes modifications to 
its shape and internal dynamics as mediated by a hyperphosphorylation defect. By 
performing FRODAN simulations and our various extensions, we were able to show 
an unprecedented first detailed view of the structural and dynamic characteristics 
of both the normal and the defective hyperphosphorylated forms of Tau [22]. This 
study provided a rich understanding of the structural basis of Tau pathology (see 
Fig. 14.14). 

FRODA, FRODAN and our various extensions can be applied to probe the dynam- 
ics of very large structures such as Spike proteins [47] or disordered proteins, which 
provides a significant advantage over traditional MD simulations. Probing motions of 
intrinsically disordered proteins with MD simulation is extremely challenging, if not 
essentially impossible, owing to their highly dynamic character. The rigidity theory- 
inspired methodologies FRODA/N discussed here can be run in either targeted and 
non-targeted modes, and we have recently combined these techniques with search 
algorithms in reinforcement learning (under review). The targeted mode employs 
biasing force during transitioning, while the non-targeted mode explores unbiased 
random fluctuations, which enables the exploration of a broad conformation space. 
Additionally, the targeted mode is useful for determining the conformational tran- 
sition pathways between distinct conformations (i.e., opening and closing motions 
such as hinge-bending motions, GPCR activation, etc.). 
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14.5 Protein Structure Validation with Rigidity Theory 


We now discuss another application of rigidity theory to structural biology. In a very 
recent study, we made an important breakthrough in the area of protein structure 
validation [48, 49]. 

Experimentally solved protein structures are only useful if they are known to be 
accurate and realistically represent the protein structures in their native environment. 
The vast majority of protein structures in the Protein Data Bank [50] have been solved 
by X-ray crystallography or NMR experiments. Both X-ray crystal structures and 
NMR structures are only model representations of experimental data, which are prone 
to uncertainties and errors. It is widely accepted that experimentally solved protein 
structures must be validated with (i) geometric tests and (ii) how well structures 
match input experimental data (restraints) [51]. Geometric criteria are easy to check 
for both X-ray and NMR structures, and measurements like R factor and Rfree values 
can be used to check how well X-ray structures match input X-ray diffraction data 
[48]. Unfortunately, no such validation criteria exist for NMR structures [51], and 
unlike crystal structures, validating the quality of NMR structures has been extremely 
difficult. In fact, since the first protein was determined by NMR in 1985 until now, 
there has been no effective method for NMR protein structural validation, which has 
largely limited the applications and use of NMR structures among protein researchers 
[51—55]. This has created a problem not only for users of structural information, but 
also for scientists who use NMR to computationally solve structures and want to 
know how accurate their solved structure is. 

While structures solved by NMR represent less than 1096 of all structures in 
PDB, they are extremely important, as not all proteins can be crystalized and NMR 
structures also include a high proportion of proteins with under-represented folds 
(shapes). NMR structures are determined in solution (a protein's natural environ- 
ment), whereas X-ray structures are determined in a crystalline environment, which 
arguably makes NMR structures more representative of in vivo structures. Hence, 
there has been a pressing need to find an acceptable validation measure for NMR 
structures. 

We have developed the method ANSURR (Accuracy of NMR Structures Using 
Random Coil Index and Rigidity) [48], which addresses this critical long-standing 
gap for NMR protein structure validation. ANSURR assesses the quality of NMR 
structures by comparing two measures of local protein rigidity, one derived from 
the original NMR input data and the other derived from rigidity theory prediction 
of protein flexibility using structural data. The measure of rigidity using input data 
is based on the Random Coil Index (RCI), which uses experimental NMR chemical 
shifts (a readily available data type for each NMR structure) to quantify the extent 
of disordered structure for each amino acid in solution. The second measure is based 
on FIRST and our rigidity theory extensions, which involves calculating the dilution 
plot (see Fig. 14.9) and extracting a flexibility score for each residue. ANSURR then 
compares these two measures of local rigidity and provides a residue-by-residue test 
of how well the rigidity of the structure (obtained from rigidity theory) compares 
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Fig. 14.15 a The ANSURR method evaluates the accuracy of nuclear magnetic resonance (NMR) 
protein structure (which are given as an ensemble of models) by comparing two measures of protein 
flexibility (orange predicted from structure, using mathematical rigidity theory using extensions of 
the method FIRST, and blue derived from the random coil index [RCI] using experimental NMR 
chemical shift data). b Analysis of ANSURR using four models from NMR (Protein Data Bank ID, 
1е17). ANSURR provides two metrics for accuracy: a correlation score between FIRST (rigidity) 
and RCI and a root mean square difference (RMSD) score. The structures in the top right portion of 
the plot (high correlation and high RMSD scores) are high-quality NMR structures, and structures 
in the bottom left of the plot are considered poor structures (Figure adapted from [48]). с ANSURR 
output for an example NMR structure (Protein Data Bank ID, 2kpp) that has high accuracy for most 
models in the ensemble 


to the experimentally determined (true, RCI chemical shift) rigidity. ANSURR pro- 
vides two metrics for accuracy measurement. One is a correlation score between 
FIRST (rigidity) and RCI, which assesses the accuracy of protein folding (secondary 
structures), and the second is an RMSD score, which measures how well the overall 
rigidity and flexibility between FIRST and RCI match (Fig. 14.15). 

Unlike crystal structures, NMR structures are always represented as an ensemble 
of (typically around 20) possible structural models. Because it is unclear which 
models are useful or accurate, this has created substantial and unnecessary confusion 
for users of NMR structures. A nice feature of ANSURR is its ability to estimate the 
accuracy of each model. 
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The performance of ANSURR was tested using several approaches [48]; first, 
ANSURR was applied to structures refined in an explicit solvent (which was found 
to be much better than unrefined structures), and then ANSURR was applied to a 
large set of good and bad structures (using decoy generations). ANSURR was also 
compared against previously proposed measures of accuracy (mostly restraint-based 
tests and geometric checks). Several of these indicators, such as restraint violations 
and restraints per residue, were shown to be poor measures of accuracy. On the 
other hand, a Ramachandran analysis (a standard check to determine if a protein 
backbone has a correct geometry) was found to be a useful geometric check of 
accuracy. A typical comparison of how well a structure compares to another structure 
is the backbone root mean square deviation, which can show if protein structures 
resemble each other when superimposed. However, this measure may miss many 
of the important structural differences found in amino acid side-chain orientations, 
which are responsible for forming critical hydrogen bonding interactions that have 
a direct impact on protein stability and functional aspects such as protein dynamics 
and enzyme catalysis. As rigidity measures are sensitive to side chains, ANSURR 
can also be used to assess the quality of side-chain atomic positions, which makes it 
a powerful tool for the assessment and refinement of protein structures. 

Recent work [49] applied ANSURR to more than 7000 NMR structures in the 
PDB, showing that NMR structures span a wide range of accuracy. Most NMR 
structures have accurate secondary structures, but are too floppy, particularly in their 
loops. Our studies also indicate that both crystal structures and NMR structures have 
equally accurate secondary structural elements (helices, sheets), but crystal structures 
are typically too rigid in disordered regions, whereas NMR structures are too flexible 
overall. 

Development of ANSURR is a major advancement in the long-standing prob- 
lem of protein structure validation, as it provides the first workable measure of the 
accuracy of NMR structures and is expected to give researchers more confidence in 
the use and application of structural NMR. Ultimately, this should lead to a better 
understanding of how proteins perform their functions, with general implications for 
structural biology research. This work opens up enormous new research avenues in 
protein structure determination and the improvement of standards for protein struc- 
ture refinement. 


14.6 Conclusion 


Studying the rigidity and flexibility of geometric frameworks has advanced consid- 
erably since Maxwell's combinatorial characterization of the rigidity of mechanical 
frameworks in the 1800s. Mathematical advancements in rigidity theory over the 
last two decades have been tremendous, opening up many exciting opportunities in 
applied sciences and engineering. In this chapter, we have reviewed some of the 
latest advances in rigidity theory and its applications for the analysis of protein 
function at an atomistic scale. Moreover, we have shown how rigidity theory-based 
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methods and our various algorithms and extensions can rapidly and accurately pre- 
dict protein flexibility and dynamics, which can be used to decipher various aspects 
of protein function, including elusive issues of allostery, enzyme catalysis, GPCR 
signalling, or motions of intrinsically disordered proteins. Our recent development 
using rigidity theory in protein structure validation has led to a development of a first 
workable method in validation of NMR protein structures. This advance will provide 
confidence to users of protein structures and is expected to accelerate and improve 
the process of protein structure determination and aid computational drug discovery. 
Rigidity theory is heavily rooted in deep mathematical formulations in the area of dis- 
crete applied geometry and combinatorics, which has unfortunately remained largely 
inaccessible to most researchers in applied science and engineering fields. While 
there has been some cross-fertilization between the various scientific fields studying 
different aspects of rigidity and flexibility, stronger interactions and interdisciplinary 
training are needed between applied and theoretical scientific communities to realize 
the enormous potential of rigidity theory applications. We advocate that rigidity the- 
ory, through both algorithmic and mathematical progress, has significantly advanced 
such that it could be widely applied in the analysis of structural biological data, which 
can complement experimental approaches to reveal novel insights on intractable and 
fundamental biological enigmas of living organism. Rigidity theory exemplifies how 
mathematics and algorithms can make significant contributions to structural biology, 
biological big-data analyses, and progress in biological applications. 
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Chapter 15 A) 
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and Walking-Home Routes from Osaka 

City After a Nankai Megathrust 

Earthquake Using Road Network Big 

Data 


Atsushi Takizawa and Yutaka Kawagishi 


Abstract When a disaster such as a large earthquake occurs, the resulting breakdown 
in public transportation leaves urban areas with many people who are struggling 
to return home. With people from various surrounding areas gathered in the city, 
unusually heavy congestion may occur on the roads when the commuters start to 
return home all at once on foot. In this chapter, it is assumed that a large earthquake 
caused by the Nankai Trough occurs at 2 p.m. on a weekday in Osaka City, where there 
are many commuters. We then assume a scenario in which evacuation from a resulting 
tsunami is carried out in the flooded area and people return home on foot in the other 
areas. At this time, evacuation and returning-home routes with the shortest possible 
travel times are obtained by solving the evacuation planning problem. However, the 
road network big data for Osaka City make such optimization difficult. Therefore, we 
propose methods for simplifying the large network while keeping those properties 
necessary for solving the optimization problem and then recovering the network. The 
obtained routes are then verified by large-scale pedestrian simulation, and the effect 
of the optimization is verified. 


15.1 Introduction 


When a disaster such as a large earthquake occurs, the resulting breakdown in public 
transportation leaves urban areas with many people who are struggling to return 
home. With people from various surrounding areas gathered in the city, unusually 
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heavy congestion may occur on the roads when the commuters start to return home 
all at once on foot. In Japan, the Great East Japan Earthquake on March 11, 2011 
left many people in central Tokyo unable to return home, and roads were flooded 
with pedestrians attempting to do so. After the Osaka North Earthquake on June 18, 
2018, the Shin- Yodogawa Bridge and its surroundings were extremely congested by 
displaced people crossing the Yodo River from Umeda. 

From reflecting on such confusion, many local governments have already decided 
on countermeasures for people who are struggling to return home [16]. Common 
among these countermeasures is that people who need to return home from their 
places of work are urged not to do so immediately after the disaster but rather to 
remain in place. Meanwhile, although it is known empirically that great confusion 
arises when difficulties in returning home occur, the associated countermeasures tend 
to be approximate because it is not known how much congestion occurs and where 
until after the disaster has occurred. Pedestrian simulation of the whole city would 
seem useful in such cases, but this has not been attempted until recently because doing 
so requires large-scale and detailed data and a high-speed calculation environment. 
However, Hiroi et al. carried out a simulation of mass returning-home behavior on 
foot after a large earthquake for an area within 40km from Tokyo Station [3]. 

In the case of western Japan, such as Osaka City, an earthquake originating from 
the Nankai Trough is the most dangerous. In Osaka City, the resulting tsunami is 
predicted to reach the shore in 1 h and 50 minutes and flood about half of the city [14]. 
One of the major problems with the tsunami is that it will travel up the Yodo River 
in the northern part of Osaka City and spread to the coastal area. As mentioned 
above, people returning home after the 2018 Northern Osaka Earthquake became 
congested around bridges crossing the Yodo River, a phenomena that Kawagishi and 
Takizawa predicted by means of a large-scale simulation of returning home from 
Osaka City [7]. 

However, if the timings of the tsunami flooding and the movement of people 
overlap, a large-scale secondary disaster may occur. Therefore, in Osaka City, it 
is necessary to consider risks such as delayed escape from tsunami along with the 
countermeasures for people who are struggling to return home, but it is difficult to 
say that the current countermeasures consider such risks. The purpose of the study 
by Kawagishi and Takizawa [7] was to investigate how a Nankai Trough earthquake 
would affect the return home of commuters in Osaka City. The results confirmed 
that bridges over the Yodo River from the center of Osaka City would be congested 
for a long time with people crossing, and that there would be a danger of delayed 
escape from tsunami by remaining in the vicinity. However, that study did not con- 
sider evacuation behavior from tsunami, and it assumed that people who walk home 
take the shortest route to do so. Therefore, problems remained, such as excessive 
concentration of pedestrians on specific roads and bridges. 

In the present study, it is assumed that a large earthquake caused by the Nankai 
Trough occurs at 2 p.m. on a weekday in Osaka City, where there are many com- 
muters. We then assume a scenario in which evacuation from tsunami is carried out 
in the flooded area and people return home on foot in the other areas. At this time, 
the evacuation and returning-home routes with the shortest possible travel times are 
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obtained by solving the evacuation planning problem [8, 19]. However, the road 
network big data for Osaka City make such optimization difficult. Therefore, we 
propose methods for simplifying the large network while keeping those properties 
that are necessary for solving the optimization problem and then recovering the net- 
work. The obtained routes are then verified by large-scale pedestrian simulation, and 
the effect of the optimization is verified. 

The remainder of this chapter is organized as follows. The next section explains 
the evacuation planning model. Next, the pedestrian simulation model for a large- 
scale network model is described. Then, the results are discussed, and conclusions 
and suggestions for future work are presented. 


15.2 Quickest Evacuation Planning Problem 


This section describes the quickest evacuation planning problem based on a dynamic 
network in which the flow rate changes over time. Meanwhile, a network in which 
the flow rate does not change over time is called a static network. 


15.2.1 Dynamic Network 


We define a directed graph D — (V, E) for vertex set V and edge set E. In V, the 
sources (1.е., the starting vertices of the flow) and sinks (1.е., the destinations) are 
given. A directed edge with a start vertex u € V and an end vertex v € V is expressed 
as e = (u, v), and the start vertex of е is expressed as tail(e) and the end vertex is 
expressed as head (e). For vertex v € V, 8$ (v) C E is defined as a set of edges going 
out of v and 85 (v) C E is a set of edges going toward v. For each edge e є E, we 
define a travel time function t : E — Z, that denotes the time required to flow on 
e from tail(e) to head(e). The maximum value of the flow on e is denoted by the 
capacity function с: E — R+. For each vertex v € V, we define a supply function 
b : V — R+ that denotes the amount of supply at that vertex, and the set of vertices 
with one or more supplies as $* C V. Furthermore, the sink set S~ C V is also 
defined. 

Using the above definitions, a dynamic network N = (D,c, t, b, St, S~) is 
defined, and Fig. 15.1 shows an example of N. Assuming application to evacuation 
planning, the flow denotes the movement of evacuees, a sink denotes an evacuation 
site, and the flow reaching a sink denotes the accommodation of evacuees at that 
evacuation site. Àn evacuee who arrives at a vertex moves on an edge and is deemed 
evacuated upon reaching a sink. The total number of evacuees at point v € V is 
regarded as the supply at that point b(v). 

Next, we define a dynamic flow f : E x Z, — R, on dynamic network N as 
the flow rate entering the edge e € E at discrete time Ө є Z}, and itis expressed as 
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Fig. 15.1 Example of a 
dynamic network N 


S'z(vv,,v3) 
S z(v4) 


f (e, 9). Note that the flow that enters tail(e) of edge e at time 0 arrives at head (e) 
at time 0 + c (e). 

Onthe dynamic flow, the following three constraints are defined. First, the capacity 
constraint is given by 


0x f(e,0) < с(е) (Ye € E,0€ Z4), (15.1) 


then the flow conservation law is given by 


8-— (e) 


e 
Уу) ) fe0- Yo У fleo) <b) WeEV,VOEZ,), (15.2) 


ees} (v) 0—0 ecôp(v) 0-0 


and the demand constraint is given by 


Ө—т(е) 
Уу) Y YS Рев = УЬ) GO eZ). (15.3) 
seS- ecôp (s) 6=0 veV 


A dynamic flow that satisfies these three constraints is said to be feasible, and the 
feasible dynamic flow that achieves the minimum time O« is called the quickest 
flow. The quickest evacuation planning problem is to find the minimum evacuation 
completion time Ox. 

Considering application to an actual evacuation planning problem, there is an 
upper limit on the number of evacuees that can be accepted at each sink, which is an 
evacuation center. As defined by Kamiyama et al. [6], it is assumed that the capacity 
function / : S^ — Z, pertains to each sink, and the feasible flow f satisfies 


© 
XO M fie.) < Us) (ses, YO € Z,). (15.4) 


еєёу (s) 9=0 
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15.2.2 Time-Expanded Network 


Ford and Fulkerson [1, 2] proposed the time-expanded network to obtain the quick- 
est flow. This is a static network corresponding to dynamic network N with time 
constraint ©, and it is designated as N (©). The set of vertices for N (©) is defined 
by 

{v(@)|v € У, Ө € (0,..., O}. (15.5) 


That is, for vertex v of the original network, vertex v(0) is provided corresponding 
to each time 0 € (0,..., ©} (see Fig. 15.2). 

The edge set of N (©) consists of two parts. First, for each edge е = (и, у) € E 
and each time 0 € (0,..., Ө — t(e)}, we have edge e(0) = (u(0), v(0 + x (e))) of 
capacity c(e). Second, for each vertex v € V and time 0 € {0,..., © — 1}, we add 
stagnant edges (v(0), v(0 + 1)) of capacity 4-oo (the horizontal edges in Fig. 15.2). 
For each vertex v € V, the supply of v(0) is defined as b(v). The supply of v(0) 
for 0 € (1,..., ©} is set to zero. Let the sink set of N(@) be {s(@)|s € 5,0 € 
{0,..., O}}. 


15.2.3 Algorithm for Solving Quickest Evacuation Planning 
Problem 


Ford and Fulkerson [1, 2] showed that for the evacuation completion time to be less 
than © in dynamic network N (©), the necessary and sufficient condition is that there 
exists a flow of size prem b(s) from source set (s(0)|s є ST} to the sink set. The 
existence of such a feasible flow can be examined by obtaining the maximum flow 
of N (GO). To consider the sink capacity for evaluation sites, we add a super sink st 


Fig. 15.2 Time-expanded 0-0 1 2 3 4(=©) 
network N (4) of Fig. 15.1 v,(0) O с=со Ом(4) 


with super sink st b BK AA 
WANN 

ASS Му 

oe | Ze 
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and edges of capacity /(s) from 5(0) to st fors є S~ and0 є ©. Then, the necessary 
and sufficient condition for the evacuation completion time to be less than or equal 
to © in dynamic network N is that the aforementioned feasible flow exists in the 
time-expanded network N (©). 

In this way, it is possible to obtain the quickest flow of evacuation planning 
in pseudo-polynomial time using the time-expanded network, but as the size of 
the actual network increases, so does that of the time-expanded one. Moreover, 
Hoppe and Tardos [4, 5] proposed the quickest transport algorithm without using 
a time-expanded network. However, although it is a polynomial-time algorithm, it 
is necessary to minimize the sub-modular function iteratively, and currently this 
algorithm is inefficient for a large-scale network such as the one in the present study. 

Generally, there is more than one quickest flow, of which the one for which the 
cumulative number of evacuees who have so far been evacuated is the largest at 
each time before the evacuation completion time © is called the universal quickest 
flow. This is obtained by first finding the evacuation completion time © and then 
finding the flow known as the lexicographic maximum flow [9] on the corresponding 
time-expanded network. When the sinks are subjected to the capacity constraint, 
the universal quickest flow does not always exist. However, when this constraint 
is imposed, the obtained flow is experimentally similar to the universal quickest 
flow [19]. 


15.3 Pedestrian Simulation Model 


Because both the travel time and time interval of a dynamic network model are 
approximate, pedestrian simulations are carried out for the obtained route to improve 
the accuracy, and the travel time and congestion are confirmed. Because the present 
study deals with a large-scale road network, we use the one-dimensional pedestrian 
model with high computational efficiency developed by Yamashita et al. [20]. In this 
model, pedestrians walking in the same direction move in a row on an edge. This 
row is called a lane, and the number of lanes is determined according to the width of 
the sidewalk as determined in Sect. 15.4.1. As illustrated in Fig. 15.3, it is assumed 
that pedestrians move in their specified lane and do not overtake. A discrete-time 
simulation is performed to determine the speed of each pedestrian in a lane at the 
next time step from their current speed and the distance between each pedestrian and 
the one walking immediately in front. 

In a lane as illustrated in Fig. 15.3, the leading pedestrian is defined as the one 
closest to the target node. Let x; (f) be the distance of the i-th pedestrian from the 
beginning of the edge from the starting vertex at time t. The velocity х;(7 + ôt) of 
pedestrian i in the lane at time t + ôt is considered to depend on the current velocity 
of the pedestrian and the distance to the pedestrian walking immediately in front, 
and it is determined by 
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Traveling direction 
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Pedestrian, Pedestrian, Pedestrian, (head) 
LJ $ 


e 
Origin node d ^ ее ^ 
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- z0 -— Destination node 
— > 


Fig. 15.3 Movement of pedestrians in a lane according to one-dimensional pedestrian model 


i-r 8t) = d) (а 50) — ay exp (: SS 223) 500), 


аз 
(15.6) 
where vo is the free walking speed, r is the radius of the pedestrian, and a1, a», 
and a3 are parameters. According to a previous study [20], we set vo = 1.023 [m/s], 
r = 0.522 [m], a, = 0.962, a» = 0.869, and аз = 0.214. 


15.4 Data Preparation 


The geographic information system (GIS) datasets used in this study are listed in 
Table 15.1, and Fig. 15.4 shows the city of Osaka covered by this study, the 20-km 
zone within which people walk home, and the flooded area. In the following, we 
explain the data preparation. 


15.4.1 Road Network 


Based on the approach of the Cabinet Office of Japan for people struggling to return 
home [10], the road network was calculated from the roads in Osaka City except for 
the expressways, and the range of the buffer was 20km. Consequently, a large-scale 


Table 15.1 GIS datasets used for optimization and simulation 
Data 


Sub-regional boundary data [17] 


Tsunami flooding estimation area [11] 
Road network [18] 


Tsunami evacuation buildings [15] 


# 
1 
2 
3 
4 
5 


Daytime population data [13] 
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Kyoto Pref 


0 25 5 10 15 20 


Fig. 15.4 Osaka City and its 20-km surrounding area 


road network comprising 815739 edges and 621670 nodes was obtained. Simplifi- 
cation of this large-scale road network is described in the next section. In the case of 
an earthquake due to the Nankai Trough, a seismic intensity of a 6-lower is assumed 
in Osaka City. There is expected to be little major damage to roads and buildings at 
this seismic intensity, therefore in this study buildings and roads are assumed to be 
undamaged. 

We assume that pedestrians move on sidewalks, but there are no sidewalk data for 
this road network. Therefore, referring to the regulations of the Ministry of Land, 
Infrastructure and Transport [12], we sampled the sidewalk width every 10 blocks 
using the distance-measuring function of Google Maps for each of six road types 
obtained from the road network data, and we unified the sidewalk width by each road 
type. 

Next, a sidewalk along which only one person could pass at a time was made to 
be a lane, and the lane width was made to be uniformly 0.75 m. The number of lanes 
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was set as an even value that did not exceed the determined width of the sidewalk 
divided by the width of a lane. This was done so that edges opposite to each other had 
the same number of lanes. For each road type, the maximum and minimum numbers 
of lanes obtained under these conditions were eight and two, respectively. 


15.4.2 Tsunami Evacuation Buildings 


As tsunami evacuation buildings, we used 649 buildings designated by Osaka City 
in 2016. These were inputted as GIS point data, and each point was connected to the 
nearest road edge by a straight line. The capacity of tsunami evacuees was set for 
each tsunami evacuation building. In total, 560816 people could be accommodated 
in all the tsunami evacuation buildings. Figure 15.5 shows the tsunami evacuation 


D —x 


| Passable bridges 


Impassable bridges 


he. 


à Pa 
"$e Osaka City Lay 


wy 


N 

À Peace 
FLEL тл гл» 
0 125 25 5 7.5 10 


Fig. 15.5 Tsunami evacuation buildings and passable bridges 


378 A. Takizawa and Y. Kawagishi 


buildings that were used. As described in Sect. 15.6, when we optimize and simulate 
the routes including the bridges over the Yodo River, the bridges in the flooded area 
are set to be impassable. 


15.4.3 Daytime Population 


The daytime population was calculated from mobile spatial statistics generated from 
the travel histories of users of mobile phones. As shown in Fig. 15.6, we used 500-m 
mesh data of the population at 2 p.m. on a weekday in Osaka City in April 2015. 
There were 2 696 546 residents and commuters in Osaka City during this period, but 
note that mobile spatial statistics cover only the population between 15 and 79 years 
of age. We allocated the daytime population equally to nodes of the road network in 
each mesh, and this became the initial arrangement of evacuees and stranded people. 
Because the mobile spatial statistics also contain the population of each residential 
area, we chose the node of the home place for each pedestrian randomly according 
to this information. 


15.4.4 Decisions on Number of People Struggling to Return 
Home and Number of Evacuees 


The polygons of the tsunami-flooded area were superimposed on the road network, 
and the flooded nodes and edges were determined. For each visitor, the action of 
evacuate, walk home, or remain in place was chosen according to the flooded con- 
dition of the present node, the flooded condition of the home node, and the distance 
to the home node. Whether or not to return home on foot was determined by the 
method used by the Cabinet Office to estimate the number of people struggling to 
return home [10]. 

Let R denote the set of commuters struggling to return home. In this approach, 
the probability P, of resident r € R deciding to return home on foot is determined 
by the following equation based on the distance d, [km] from the current place to 
the returning place: 


1 (d, < 10), 
P; = 4% (10 <d, < 20), (15.7) 
0 (20 < d,). 


In this study, the return distance of each visitor is the length of the shortest path 
from their present node to their home node obtained on the road network before 
simplification. In the case of resident r whose return distance is 10 < d, < 20, the 
action is decided probabilistically according to P, with uniform distribution. The 
conditions for each action are summarized in Table 15.2. 
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Fig. 15.6 Distribution of commuters in Osaka City at 2 p.m. on a weekday in April 2015 


Table 15.2 Decision rules for each action 


Action Conditions 


Current node 


Evacuate Flooded area 


Home node 


Flooded area 


Return distance 


Non-flooded area 


10 « d, < 20 (not 
returning home) or 
20 < d, 


Return home 


Non-flooded area 


d, < l0 or 
10 < d, < 20 (returning 
home) 


Remain in place Non-flooded area 


Flooded area 
Non-flooded area 


10 « d, « 20 (not 
returning home) or 
20 < d, 
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Table 15.3 Breakdown of numbers of people involved in each activity 


Action Number of people 
Evacuate 701 649 

Return home 1408990 

(via a bridge) 150994 

(others) 1257996 

Remain in place 585097 

Total 2696546 


Table 15.3 lists the breakdown of the number of people for each activity classified 
according to these rules. Of the people who return home on foot, approximately 
150000 cross bridges over the Yodo River, and they become the objects for route 
optimization. Everyone else returning on foot was deemed to take the shortest route. 


15.5 Simplifying and Restoring Large Road Network 
for Route Optimization 


Computing the quickest flow depends greatly on the scale of the network. Although 
the main part of the quickest-flow algorithm is computing the maximum flow, the 
effect of parallelization on this algorithm is limited. Therefore, it is difficult to apply 
this algorithm to a large network, even by using a recent central processing unit with 
many cores. In this study, we simplify the large-scale road network and optimize the 
routes for evacuation and returning home using the quickest flow. We also develop a 
method for restoring the optimized routes to the original road network. The proposed 
method is outlined below and illustrated in Fig. 15.7. 


15.5.1 Simplification of Road Network 


The basic idea is to construct a simplified road network by dividing the space by 
polygons of the sub-regions of the area and connecting their centers of gravity with 
straight lines between adjacent sub-regions. At this time, the sum of the numbers 
of lanes of the original edges crossing the line segments shared by two polygons is 
made to be the number of lanes of a simplified edge. If no original edges crossed 
between two sub-regions, then the two polygons are not connected by a simplified 
edge. The length of a simplified edge is the Euclidean distance between centers of 
gravity, and commuters assigned to nodes in a polygon are aggregated on its center 
of gravity. 
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Fig.15.7 Simplification and restoration of road network 


The following procedures were carried out using GIS software to simplify the 
original road network: recognizing adjacent polygons, decomposing polygons into 
line segments, generating the centers of gravity of the sub-region polygons, extracting 
the road edge that crosses the line segment of each pair of adjacent polygons, and 
generating the simplified road network. Consequently, there were 36276 edges and 
15 853 vertices, these being approximately 4% and 3%, respectively, of those of the 
original road network. 


15.5.2 Restoring Optimized Routes on Original Road 
Network 


Let A be a set of sub-regions traversed by an origin-destination (OD) path in the set 
of optimized OD paths on a simplified road network. In this study, we refer to the 
OD path in the original road network being obtained as the shortest path in the road 
network in A as route restoration. However, with this method, the destination may 
not be reachable using only the road network in A (see Fig. 15.7d2). In that case, the 
route obtained by the optimization is not used and is replaced by the shortest path 
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in the whole road network. Then, the extent to which the original OD path could be 
restored from the route in A is evaluated as the reproduction rate. 


15.6 Route Optimization Settings 


Thus, the routes for evacuation and returning home can be optimized. To prioritize 
human life, we first secure evacuation routes for tsunami evacuees and then optimize 
the routes for people walking home. The procedure and settings are described below. 


15.6.1 Optimization Steps 


First, we explain the concept of optimization for tsunami evacuees. As mentioned in 
Sect. 15.4, in Osaka City, there are many tsunami evacuation buildings in the expected 
flooded area, and the plan is to evacuate to those buildings. However, many areas 
may continue to be flooded for several days even after drainage is carried out, and it 
is feared that many tsunami evacuation buildings will be isolated by flooding. 

In the event of a tsunami disaster in a large city, it is reasonable to suppose that not 
many evacuees will use the tsunami evacuation buildings, given the limited resources 
for rescuing evacuees from such buildings. Therefore, it is necessary to clarify which 
areas contain evacuees who can only evacuate to a tsunami evacuation building. In 
this study, we optimize the destinations and routes of evacuees in the following three 
steps. 


Step 1 

In the simplified road network, the destinations of evacuees are set not as the tsunami 
evacuation buildings but as the intersections of the boundaries of the flooded-area 
polygons and the intersecting edges. Then, they are connected to one super sink, the 
route is optimized by the universal quickest flow, and the evacuation completion time 
for each evacuee is calculated. 


Step 2 

For evacuees whose evacuation completion time determined in step 1 exceeds 1 h and 
50 minutes, their evacuation routes are optimized again using the universal quickest 
flow to evacuate to tsunami evacuation buildings. At this time, the optimization is 
executed by using the residual network of the time-expanded network used in step 1 
except for that of evacuees in step 2. 


Step 3 

The routes of approximately 150000 commuters walking home across passable four 
bridges over the Yodo River shown in Fig. 15.5 are optimized using the universal 
quickest flow. We refer to such pedestrians as "bridge passers." In this case, sinks are 
set to nodes on the north side of each bridge and are connected by one super sink. In 
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addition, the residual network used in the optimization up to step 2 is used. People 
in areas other than the flooded area return home via the shortest route, this being 
because there is less congestion than on the bridges, and the optimization problem in 
this case becomes a general multi-commodity flow problem, which is more difficult 
than the quickest-flow problem. 


15.6.2 Computational Conditions 


The time unit of the quickest flow was set to 10s considering the computational 
time and available memory. The walking speed of a pedestrian was set to 1 m/s. 
Meanwhile, the free walking speed in the one-dimensional pedestrian model was set 
to 1.023 m/s, which is similar to that in the original model [20]. The optimization and 
simulation code was implemented using Visual C++ 2015, and LEDA 6.4 was also 
used as a network library to solve the maximum-flow problem. The optimization 
and simulation were carried out on a personal computer (PC) with Windows 10 
Professional 64 bit, an Intel Core 17-6700k, and 32 GB of memory. 


15.7 Results of Route Optimization 


The optimization results are shown below, where the evacuation completion time 
is the result of each pedestrian walking along the designated route using the one- 
dimensional pedestrian model on the original road network. 


15.7.1 Computational Times 


The computational times for the route optimization for the two types of pedestrian are 
listed in Table 15.4. Even though the PC that was used was of an older specification 
dating back several generations, the computation took only a matter of days. In 
other words, the problem could be computed even with such a low-specification PC. 
Although there were fewer bridge passers, their optimization took longer, probably 
because their routes were longer. 


Table 15.4 Computational time for each optimization 


Pedestrian type Computational time [h:min] 


Evacuee 29:56 
Bridge passer 44:46 
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15.7.2 Reproducibility of Restored Routes 


We analyze the reproducibility of the routes optimized by the simplified network 
after they are restored to the original network. The reproducibility is evaluated by the 
difference in the length of a route before and after the restoration and the selectivity 
of the route described above. Table 15.5 lists the mean route lengths before and after 
network restoration for each type of pedestrian. In the case of evacuees, the mean route 
length increases after restoration, whereas it decreases for bridge passers. However, 
the restoration does not cause an extremely large difference in either case. 

Table 15.6 lists the selection ratio, which is the percentage of each type of pedes- 
trian using the routes obtained by the universal quickest flow after network restora- 
tion. Although the selection ratio for evacuees exceeded 80%, that for bridge passers 
was only 64%. Because the route became longer for the latter, this is thought to have 
increased the number of cases in which a route cannot be constructed within the 
limited range. 


15.7.3 Optimization Results 


Here, we assess by how much the optimization shortened the travel time compared 
with that of the shortest route. 

First, regarding the movement by evacuation, Fig. 15.8 shows how the cumula- 
tive number of evacuees for each type of route varies with time, and Table 15.7 lists 
the mean evacuation time and evacuation completion time. Although the cumulative 
numbers of evacuees for both types of route vary similarly, the effect of the optimiza- 
tion is evident because it shortens the evacuation completion time by approximately 
I h compared with that of the shortest route. However, the evacuation completion time 


Table 15.5 Mean route lengths for each type of pedestrian before and after network restoration 


Pedestrian type Mean length [m] 

Before After 
Evacuee 1798 2013 
Bridge passer 7712 7593 


Table 15.6 Selection ratios for optimized routes 


Pedestrian type Total Number of Selection ratio 
optimized-route 
selectors 


Evacuee 7701 649 587585 0.84 


Bridge passer 150994 97372 0.64 
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Fig. 15.8 Cumulative number of arriving evacuees for each route type 


Table 15.7 Comparison of travel times of evacuees for both route types 


Route type Mean travel time [h:min] Travel completion time [h:min] 
Optimized route 1:18 7:13 
Shortest route 1:22 8:18 


is over 7 h, which is too long to avoid the impact of the tsunami. This is considered to 
be a result of interference between the routes of evacuees and people returning home. 
At the time of optimization, priority was given to evacuees, but this assumption may 
have collapsed upon restoring the routes. Regardless, it is suggested that evacuees 
should avoid evacuating outside the flooded area by using the tsunami evacuation 
buildings as much as possible. 

Next, we perform a similar verification for bridge passers. Figure 15.9 shows how 
the cumulative number of arriving people varies with time, and Table 15.8 compares 
the mean travel times and travel completion times for both route types. In the case of 
the shortest route, pedestrian bridge congestion begins early, after which the slope of 
the straight line of the accumulated number of arriving people is relatively low. As 
a result, the completion time of returning home was drastically shortened by about 
3h and 20 min by the optimization with consideration of securing routes for tsunami 
evacuees. 
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Fig. 15.9 Cumulative number of arriving bridge passers for each route type 


Table 15.8 Comparison of travel times of bridge passers for both route types 


Route type Mean travel time [h:min] Travel completion time [h:min] 
Optimized route 5:32 16:47 
Shortest route 8:52 20:23 


The effect of the optimization was demonstrated, especially for bridge passers. To 
understand the changes concretely, the total numbers of pedestrians passing along 
each road for both route types are visualized in Fig. 15.10. In the case of the shortest 
route, people returning home are concentrated on the Nagara Bridge, but when the 
route is optimized, two bridges upstream from the Nagara Bridge are used. 

People generally use the Shin- Yodogawa Bridge to travel to the north of the Yodo 
River from Osaka City, but in this case that bridge cannot be used because it is 
in the tsunami-flooded area. Therefore, with no restrictions, most people returning 
home would cross the Nagara Bridge, which is the next one upstream of the Shin- 
Yodogawa Bridge. Bridges further upstream than the Nagara Bridge are not usually 
used for transportation from Osaka City because they are located more than 3 km 
away. However, the optimization means that these bridges are also used for returning 
home, and congestion is reduced. 
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Fig. 15.10 Total number of pedestrians passing at each road edge for each route 


15.8 Conclusion 


In this study, we proposed a method for network simplification and restoration to 
optimize the traveling routes of more than 2 million pedestrians with a large-scale 
and detailed road network in Osaka City and its surrounding area. We then showed 
that such route optimization worked well. 
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Chapter 16 A) 
Stream-Based Lossless Data ше 
Compression 


Shinichi Yamagiwa 


Abstract In this chapter, we introduce aspects of applying data-compression tech- 
niques. First, we study the background of recent communication data paths. The 
focus of this chapter is a fast lossless data-compression mechanism that handles data 
streams completely. A data stream comprises continuous data with no termination of 
the massive data generated by sources such as movies and sensors. In this chapter, we 
introduce LCA-SLT and LCA-DLT, which accept the data streams, as well as sev- 
eral implementations of these stream-based compression techniques. We also show 
optimization techniques for optimal implementation in hardware. 


16.1 Introduction to Stream-Based Data Compression 


Rapid communication data paths are demanded in computer systems to improve per- 
formance, and the fastest data paths have recently reached the order of tens of giga- 
hertz as implemented by optical fiber. One solution to achieving rapid communication 
data paths is to have parallelized paths in multiple connections, but technological tri- 
als have offered no clear solutions because of electrical and physical limitations such 
as crosstalks and refractions. To overcome the problems associated with high-speed 
communication, this chapter focuses on data compression on the data path. There 
are two ways in which this can be implemented. One is software-based compression, 
which is typically implemented on the lower layer of the communication data path, 
such as the device-driver level of Ethernet [18]. The other way is hardware-based 
implementation, which must provide low latency and stream-based compression and 
decompression. 

Well-known algorithms such as Huffman encoding [17] and Lempel-Ziv-Welch 
(LZW) compression [21, 22] perform data encoding by creating a symbol lookup 
table (LUT), in which frequent data patterns are replaced by compressed symbols 
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in the table. However, hardware implementation presents the following difficulties: 
(1) the processing time is unpredictable because the data length is not deterministic, 
(2) maximal memory must be prepared because the lengths of the data patterns are 
not deterministic, and (3) blocking decompression is performed. Here, we focus on a 
stream-based lossless data-compression mechanism that overcomes these problems. 
The key technology is a histogram mechanism that caches the compressed data. 
The decompressor must manage the same table contents as the compressor side and 
reproduce the original data from the table. In this chapter, we introduce challenges to 
implementing stream-based lossless compression based on hardware. The ultimate 
goal is to implement compact and fast data-compression hardware without blocking 
the compression operations upon accepting continuous data streams. We begin by 
focusing on a technique with a static LUT, called LCA-SLT, and then we show one 
with a dynamic table, called LCA-DLT. We also describe performance optimizations 
for LCA-DLT. 


16.2 Stream-Based Lossless Data Compression with Static 
Look-Up Table 


16.2.1 Design of LCA-SLT 


We begin by focusing on a compression algorithm called online LCA (Lowest Com- 
mon Ancestor) [12], which converts a symbol pair to an unused symbol with the 
LUT of symbol pairs managed as shown in Fig. 16.1, which shows an example of 
compressing the sequence ABCDFFBC to the symbol Z. Online LCA addresses the 
problems caused by conventional dynamic LUT management and provides a fixed 
time complexity due to the two-symbol matching. During the decompression, online 
LCA invokes the opposite mappings by repeating conversions from one symbol to 
two according to the table starting from the deepest compression step. 

Applying the concept of online LCA, we show here the mechanism of LCA- 
SLT (LCA Static Look-up Table) [20], which prepares statically allocated LUTs that 
are used for converting symbol pairs. The compressor encodes inputted symbols 
using the LUTs, and the decompressor does the opposite. The contents of the tables 
are stored statically and initially before the compression/decompression. The tables 
are prepared heuristically in the following steps: (1) a test set of the target data is 
examined by online LCA, (2) the LUTs are created from all the original symbol pairs 
and their matching symbols, (3) the entries in ће LUTs are sorted in ascending order 
by frequency, and finally (4) the entries in the top ranks are registered as the table 
contents. These steps implement the best matching patterns in the original data set 
as determined by the frequency analysis. 

As shown in Fig. 16.2, the compressor and decompressor perform online LCA 
using the tables created from a set of test data patterns. The modules are connected 
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from the compressed and decompressed data lines one after another and organize a 
pipeline for recursive compression/decompression operations. 

The method with static LUTs has two main advantages. First, the compressed 
data never include any additional information for table management. Second, the 
amount of table resources is deterministic. Therefore, LCA-SLT can be implemented 
on compact hardware and is fast because of its simple compression/decompression 
operations. 


16.2.2 Implementation of LCA-SLT 


On hardware, the compressor and decompressor can be implemented using a content- 
addressable memory (CAM) [8] and a normal memory (MEM), respectively. The 
CAM is a type of hardware into which a set of data bits is inputted and that outputs 
a matched address where the data are stored. Figure 16.3 shows the organization of 
the compression part. As an example, the combination of two symbols becomes 16 
bits when the symbol width is 8 bits, and we add another bit per compressed data to 
mark whether it is compressed, called the compression mark (CMark) bit. Figure 16.3 
shows a compression pipeline in which four modules are connected. Each module 
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Fig. 16.3 Organization of Content addressable memory Normal memory receives the matching address 
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adds another CMark bit, and the number of bits in the compressed data is extended by 
one bit per compressed data. Thus, the compression module at the end of the pipeline 
generates 12-bit compressed data. Decompression involves the same operations as 
the compression steps but in the opposite direction. 


16.2.3 Performance Evaluations 


We discuss here the performance of the LCA-SLT. We evaluate the compression ratio 
and the matching ratio to the symbol pairs in the LUT during the compression. The 
table is implemented with a fixed number of entries, namely, 32, 64, 128, or 256. 
For the evaluations, we use Linux source codes of 50 and 200 Mbyte, as well as a 
DNA sequence of 50 MB downloaded from [2]. Figure 16.4 shows the compression 
ratio (the data size after compression divided by the original size) and the matching 
ratio of the symbol pairs during the compression. With increasing number of table 
entries, the compression ratio improves and the matching ratio of the symbol pairs 
becomes about 6096. 

Next, we show the implementation of the LCA-SLT module with 8-bit sym- 
bols and 4-bit CMark on a Xilinx Spartan-6 field-programmable gate array (FPGA; 
IC code XC6SLX453CSG324). We have two options for implementing the CAM: 
either shift register LUT (SRL)-based or block RAM (BRAM)-based CAM. We can 
implement the MEM by applying the BRAM on the FPGA. Table 16.1 shows the 
compilation reports. The operation timings of both the SRL-based and BRAM-based 
CAM are precisely the same. However, the number of used slice registers is larger 
than that of the LUTs in the case of BRAMs because the latches are not packed 
into the LUTs. The LUTs are used for the combinational logic for the I/O buses 
around the memory. Besides, the SRL-based case increases the number of LUTs. 
Therefore, when an application needs many LUTS, such as wide data/address buses 
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Fig. 16.4 Performances of LCA-SLT 


for a processor interface, it is effective to implement LCA-SLT. On the other hand, 
the SRL-based implementation shows that the maximal frequency for the input clock 
will decrease drastically with increasing number of LUT entries. In the FPGA case, 
we must consider how the number of table entries affects the performance because 
the limited number of physical wires in the large-scale integration decreases the 
routing availability when the matching address bits due to the CAM become wide. 

Thus, the LCA-SLT implements a compression mechanism with small overhead 
for data streams. It is reconfigurable depending on the characteristics of the target 
data, addressing the desired performance depending on the number of compres- 
sion/decompression modules or the number of bits in a symbol or the available 
symbol mapping entries in the LUT. 
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Table 16.1 Compilation reports regarding hardware implementations of LCA-SLT 


128 entries # of slice registers | # of slice LUTs | # of BRAMs Max freq. 
Using SRL-based | 141 93 MHz 
CAM 

Using 93 MHz 
BRAM-based 

CAM 

256 entries # of slice registers | # of slice LUTs | # of BRAMs Max freq. 
Using SRL-based | 144 4135 1 75 MHz 
CAM 

Using 638 1546 41 101 MHz 
BRAM-based 

CAM 

512 entries # of slice registers | # of slice LUTs | # of BRAMs Max freq. 
Using SRL-based | 152 8128 1 51 MHz 
CAM 

Using 1152 2567 81 81 MHz 
BRAM-based 

CAM 


16.3 Stream-Based Lossless Data Compression 
with Dynamic Look-Up Table 


16.3.1 Design of LCA-DLT 


Next, we focus on another algorithm for stream-based data compression with 
dynamic table management, called LCA-DLT (LCA Dynamic Look-up Table) [19]. 
It allocates corresponding symbol LUTs for the compressor and the decompressor, 
respectively. Each table has any number N of entries and the i-th entry Е; includes 
a pair of the original symbols (s0;, s1;), a compressed symbol $;, and a frequent 
counter count;. The compressor side uses the following rules: (1) reading two sym- 
bols (50, s1) from the input data stream and if they match to 50; and s1; in a table 
entry E;, then after incrementing the count;, it outputs S; as the compressed data; (2) 
if the symbols do not match to any entry in the table, it outputs (50, 51) and register 
an entry (50;, 514, Sk, count, = 1) where S; is the index number of the entry; (3) 
if all entries in the table are used, then decrement all count; (0 € i « N) until any 
count(s) become zero, and then delete the corresponding entries from the table. When 
compressed data $ are transmitted from the compressor, the steps in the decompres- 
sor are equivalent to those in the compressor. The symbol matching is performed 
based on S; in an entry. If the compressed symbol 5; matches to Sx in a table entry, 
then (50,, 1) is outputted. If not, then another symbol S’ from the compressed data 
stream and the pair (S, S’) is outputted and then the pair is registered in the table. 


16 Stream-Based Lossless Data Compression 397 


Input: Input: Input: Input: 
ABABCDACABEFDCAB ABABCDACABEFDCAB — ABABCDACABEIDCAB Kai de mi. pier 
sO s1 S coun s0 s1 S coun s0 s1 count ТВ по 
АТВ[0][1 A[B[O[2 as EST 

АС [2 [1 These entries 
E--ÉT 311 are invalidated 
Output: AB Output: ABO Output: ABOCDACOEF Output: ABOCDACOEFDC 
a) When a pair of symbols b) When a pair of symbols с) When the table is full, d)A new entry is added in 
does not match in the table. matches in the table. entries are invalidated. the table after invaliation. 
Fig. 16.5 Compression example for the LCA-DLT 
Input: Input: Input: Input: 
ABOCDACOEFDCO ABOCDACOEFDCO ABOCDACOEFDCO ABOGDACOEEDCO 
s0 s1 S count s0 51 count 
s0 s1 S count s0 s1 S count Alao a АТВТГО Г? 
А В 0 1 А Во 2 
р [с 1 1 
= B 1 1 These entries 
are invalidated 
E F 3 1 
Output: AB Output: ABAB Output: ABABCDACABEF Output: ABABCDACABEFDC 
a) When a compressed symbol b) When a compressed c) When the table is full, d)A new entry is added in 
does not match in the table. symbol matches inthe table. ^ entries are invalidated. the table after invaliation. 


Fig. 16.6 Decompression example for the LCA-DLT 


When the table entry is full, the same operations as those of the compressor are 
performed. 

Figures 16.5 and 16.6 show examples of compression and decompression oper- 
ations, respectively. Here, the input data stream for the compressor is ABABCDA- 
CABEFDCAB. First, the compressor reads the first two symbols AB and tries to 
match that pair in the table (Fig. 16.5a). However, the matching fails, and the com- 
pressor registers A and B as the sO and s1 in the table. Here, the compressed symbol 
is assigned in the entry, which is the index 0 of the table. Thus, a rule AB—0 is 
performed. The count is initially set to 1. When the compressor continuously reads 
a pair of symbols (again AB) and it matches in the table, Fig. 16.5b translates AB to 
0. Subsequently the equivalent operations are performed. If the table becomes full 
(Fig. 16.5c), then the compressor decrements the count(s) of all entries until any 
counts become zero. Here, three entries are invalidated from the table in the figure. 
The compressor will register a new entry to the invalidated entry from the smallest 
index of the table. Figure 16.5d shows that the compressor added a new entry after the 
invalidation. Finally, the original input data are compressed to ABOCDACOEFDCO. 

The decompressor reads A first (Fig. 16.6a), but it does not match any compressed 
symbol in the table (because the table is empty). The decompressor then reads another 
symbol B and registers AB to a new table entry. The entry saves a rule AB— 0. Thus, 
the output becomes AB. The decompressor reads the next symbol 0 (Fig. 16.6b), 
which matches to the table entry. The decompressor translates it to AB and outputs 
it again. After the subsequent decompression operations, when the table becomes 
full, the decompressor decrements the count(s) as well as on the compressor side 
(Fig. 16.6c). The invalidated entries must be equivalent to those on the compressor 
side. Therefore, the compressed symbols are consistently associated with the original 
symbols. Finally, the compressed data inputted to the decompressor are associated 
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and outputted as ABABCDACABEFDCAB, which is the same pattern as the input 
data on the compressor side. 


16.3.2 Implementation of LCA-DLT 


Figure 16.7 shows an implementation of the LCA-DLT. The input data are propa- 
gated through the latches, and the compressed/decompressed data are processed in 
a pipeline manner. The LUT in the compressor is organized as shown in Fig. 16.8a. 
The symbol LUT performs the compressed/decompressed data association. Here, the 
index becomes the compressed symbol, and the enable signal from the matching part 
increments the count. The full management logic of the LUT activates the invalidate 
control: it decrements the count and resets the valid bits (v in the figure) regard- 
ing the invalidated entry. The LUT in the decompressor is organized with a RAM 
and a CAM as shown in Fig. 16.8b. The management part of count also performs 
equivalently to that of the compressor based on a CAM. Besides, the matching part 
is implemented simply in a RAM. The compressed data generated from the address 
are inputted to the RAM, and the original uncompressed data pair is associated. 

The invalidate operation looks for the minimal counts in the table entries by 
decrementing those counts. During the operation, the stall signal is outputted to stop 
the compression/decompression data pipeline. Figure 16.9a shows an implementa- 
tion based on parallel decrement logic, and Fig. 16.9b shows one based on serial 
decrement logic. These two implementations have a tradeoff between the amount of 
logics and the compression speed when the table becomes full. 

In the LCA-DLT as in the LCA-SLT, the compressor adds the CMark bit that 
indicates whether or not the symbol is compressed. Moreover, by combining the 
compressor and decompressor in a module and cascading the modules as shown in 
Fig. 16.10, we can compress long symbol patterns corresponding to 2, 4, 8, or 16 
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Fig. 16.7 Overall functional block diagrams of the compressor and decompressor in LCA-DLT. 
The compressor’s LUT receives two input symbols from the latches and outputs the selected signal to 
the multiplexer for the output data. The decompressor’s LUT performs the opposite data translation 
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Fig. 16.8 Detailed organization of LUTs in LCA-DLT. The table has 2n entries when a symbol is n 
bits. The matching part for sO and s1 must be organized as a content-addressable memory (CAM), 
which outputs the index (i.e., the address in the CAM) matched to an inputted pair of (s0, s1). The 
management part for count is also organized by a CAM 
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Fig. 16.9 Decrementing logic for entry invalidation in LCA-DLT 
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Fig. 16.10 Cascading modules of LCA-DLT. This example compresses long symbol patterns cor- 
responding to 2, 4, 8, and 16 symbols. If the input data at the first compressor are 8 bits long, then 
the output compressed data become 12 bits because of the CMark bits 


symbols when there are four modules. If the input data at the first compressor are 8 
bits long, then the output compressed data become 12 bits after four modules because 
of the CMark bits. 
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Fig. 16.11 Compression perfomances of LCA-DLT 


16.3.3 Performance Evaluations 


Figure 16.11 shows the compression ratios ((compressed_data_size ~ original_ 
data_size) x 100). The numbers of table entries are varied from 16 to 256. Focusing 
on the performance impact of the number of table entries, the compression ratios are 
improved linearly except for the gene DNA sequence; because the DNA data have 
a few patterns, all patterns can be saved in 16 entries. Furthermore, focusing on the 
impact of the number of modules, the compression ratios degrade in the case of 
more than two modules. This means that a communication data path using too many 
compression modules becomes disadvantageous because of the CMark bit added 
after each module. 

Figures 16.12 and 16.13 show the hardware performances of the LCA-DLT. It was 
implemented with only hundreds of slices and a memory block in the FPGA. The 
LCA-DLT works at 100 MHz with any number of modules, thereby achieving 800 
Mbit/s. The LCA-DLT has large impact on resource usage with respect to the logic 
but not the memory because the recent FPGA does not have any dedicated hardware 
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Fig. 16.12 Hardware resources of LCA-DLT. It is compiled with 8-bit data input in the first com- 
pressor for the Xilinx Artix7 device (XC7A200T-1FBG676C) 
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Fig. 16.13 Performance comparison between parallel and serial invalidation mechanisms with two 
modules 


macros for CAMs. It is inevitably implemented by LUT and registers in the FPGA. We 
also compare the amount of hardware resources among the mechanisms of the parallel 
and the serial invalidations. The parallel version uses larger hardware resources; 
regarding the dynamic performance of the LCA-DLT, the parallel version involves 
very few stalls, but its hardware resources explode. Assuming that the hardware 
works at 100 MHz, the effective bandwidth in the input of the first compressor 15 
about 800 and 340—730 Mbit/s with the parallel and serial invalidations, respectively. 
The output bandwidth of the second compressor will be reduced to 35-80% of the 
original data size. This means that the LCA-DLT realizes a communication data path 
that can send more data even if the speed of the path is slow, and it also contributes 
largely to realizing a high-speed communication data path while providing flexible 
adjustment between the hardware resources and the compression performance. 
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16.4 Optimization Techniques for LCA-DLT 


Here we introduce optimization techniques for implementing the LCA-DLT. We 
consider two available optimization techniques: lazy management and time-sharing 
multi-threading. 


16.4.1 Lazy Management of Look-Up Tables 


First, we consider the techniques of dynamic invalidation on LUTS and lazy com- 
pression [11] that eliminate stalls during the LUT invalidations. 


16.4.1.1 Dynamic Invalidation for Look-Up Table 


With the management technique of dynamic invalidation for the symbol LUT, we 
prepare a remove pointer and an insertion pointer. Initially, the remove pointer points 
to any entry of the symbol LUT. The count; is decremented when the pointer comes 
to the table index i, and if the count; becomes zero after the decrement, then the 
entry is removed from the table. The pointer is moved to the next table index after 
any table search operation. By contrast, the insertion pointer initially points also to 
any empty entry in the symbol LUT; if the entry is used, then the pointer moves to 
an unused entry. Using these two pointers, we can expect that a moderate number of 
the entries occupied in the symbol LUT can be removed. 

Figure 16.14 shows an example of the dynamic invalidation mechanism for com- 
pression. We assume that DCAADCBBDDB is inputted to the compressor and that 
the remove pointer starts on the second entry of the table. First, DC does not match 
any entry in the table (Fig. 16.14a), and the compressor waits for an empty entry to 
appear. The remove pointer is moved to the next entry and the count value is decre- 
mented. In Fig. 16.14b, the count value of the third entry becomes zero, whereupon 
the entry is removed. The insertion pointer is moved to point to the empty entry. 
The new entry for DC is registered to where the insertion pointer is pointing. Now, 
DC is outputted. During these operations, the input and output of the compressor 
stall. When the input symbol pair matches an entry, it is compressed as shown in 
Fig. 16.14c, d, the remove pointer is moved, and the count value is decremented. If 
the entry that matches the input symbol pair corresponds to the one pointed out by 
the remove pointer, then the count value does not change, as shown in Fig. 16.14e. 
Finally, after the initially inserted DC is removed because of the count value, the 
entry is used as a new one. Because it was not found in the table, DB is outputted. 
Thus, the compressed data stream becomes DCOI2DB. 

Figure 16.15 shows the steps of the decompression mechanism using the dynamic 
invalidation. The inputted compressed data stream is the one generated by the com- 
pression in Fig. 16.14. The insertion and the remove pointers begin from the same 
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Fig. 16.14 Example of the dynamic invalidation mechanism for compression 
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Fig. 16.15 Example of the dynamic invalidation mechanism for decompression 
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Fig. 16.17 Example of the lazy compression on decompressor side 


entries initially defined by the compressor. Although the matching target is the com- 
pressed data, the steps are equivalent to the ones performed on the compressor side. 
In Fig. 16.15a, b, the I/O of the decompressor stall. When matching the compressed 
symbol in an entry, the decompressor outputs the corresponding symbol pair such as 
in Fig. 16.15c, d. Again, a stall occurs during the invalidation of an entry as shown 
in Fig. 16.15e, f. Finally, the original data stream is decoded. 


16.4.1.2 Lazy Compression 


Another optimization technique is the lazy compression. This technique ignores 
compression using the symbol lookup table when the symbol lookup table is full. 
This eliminates stalls and continuously outputs the data to the decompressor side. 
Figure 16.16 shows a compression example of lazy compression applied to the 
LCA-DLT with dynamic invalidation. First, DC does not match any entry in the table. 
Here, the lazy compression just passes through the symbol pair without registering 
the pair into the table. Therefore, no stall occurs as in Fig. 16. 16a, e. When the symbol 
pair matches an entry, the pair is compressed to the corresponding symbol as shown 
in Fig. 16.1 6b, d. If the table contains empty entry(ies) when the inputted symbol pair 
does not match any entry, then it is registered to the empty entry and is also passed 
through to the output as in Fig. 16.16c. The output from the compressor becomes 
DCODCIDB, which is larger than DCO21DB for the case of eager compression. 
Figure 16.17 shows the case for the decompressor. First, D is not included in 
the table, therefore the input is the original data pair because actually the CMark 
bit is added to the compressed data. The compressor does not register the pair and 
passes through DC to the output as shown in Fig. 16.17a, e without any stall. If the 
compressed data are in the table, then the decompressor translates the original symbol 
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Fig.16.18 Compression ratios with optimizations. The orange lines show lazy compression against 
the full search method, and the blue ones show dynamic invalidation against the full search method. 
The results depicted as lines were from using a compressor with four modules 


pair such as in Fig. 16.17b, d. If the symbol is not in the table and there are empty 
entry(ies), then the inputted symbol pair is registered. 


16.4.1.3 Performance Evaluations 


Figure 16.18 shows the compression ratios with the above optimizations in the LCA- 
DLT. The bars show the ratios (i.e., the compressed data size divided by the original 
data size). We can confirm that the lazy compression effectively eliminates stalls and 
does not disturb the compression, although it does not compress the inputted data 
when the data pair does not match entries of the symbol LUT. Overall, both of the 
proposed mechanisms provide more-effective compression ratios than does the full 
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Fig. 16.19 Stall cycles and the stall ratios against the total clock cycles in LCA-DLT with opti- 
mizations 


search method. These mechanisms work well if the randomness of the data is high 
(i.e., the data entropy is high). 

We measured the stall clock cycles to compare the dynamic performance of hard- 
ware implementation with that of the proposed techniques. We used a Xilinx Artix-7 
FPGA XC7A200T-1FBG676C. The full search method works at 100 MHz in this 
device as described in the previous section. By contrast, the implementation with 
both proposed mechanisms works at 130 MHz because the implementation was sim- 
plified by the lazy management of the symbol LUT. 

Figure 16.19 shows the stall cycles as the bars and the stall ratios against the total 
clock cycles as the lines. The total throughput of the data stream becomes much 
better than that with the full search method. The degradation of the throughput is 
30% with the full search but less than 3% with dynamic invalidation. Regarding lazy 
compression, the compression delay is the number of clock cycles for the input data 
stream and is also the number of bytes of the input data (i.e., 10M cycles) because 
lazy compression never causes stalls. 


16.4.2 Time-Sharing Multithreading on Compression 


16.4.2.1 Design and Implementation of Time-Sharing Multithreading 


The time-sharing multi-threading [10] allows the compressor and decompressor to 
accept multiple different data streams by dividing the dictionary updating operations 
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Fig. 16.20 Example structure of time-sharing multi-threading (TSM) 


among the various input streams. When N data streams are inputted to the compres- 
sor/decompressor, the dictionary updating for each data stream allows NàL1 clock 
cycles to be inserted to solve the updating problem. For example, Fig. 16.20, shows 
a structure with two compressors that share the pipeline stages for the dictionary 
updating operations while accepting two different data streams. This mechanism 
does not cause any stalls during the input data streams, therefore the bandwidth of a 
data stream of a whole compressor/decompressor module degrades to 1/N. However, 
the clock frequency is expected to increase. 

In implementing the compression mechanism, the following operations are 
assigned to stages of the encoder pipeline for the compressor hardware. The pre- 
process operation is performed to prepare the subsequent table matching operation, 
after which the table search operation is performed. The symbol registration opera- 
tion to the LUT performs registration of symbols, and finally the symbolizing/lookup 
operations are performed against the LUT. For decompression, the operations are 
performed in the opposite way to symbolize the compressed data to an original data 
pair. 

Next we discuss an implementation example of time-sharing multi-threading 
in the LCA-DLT. Assume that there are two input data streams for the compres- 
sor/decompressor, and the pipeline of the compressor is organized as shown in 
Fig. 16.20. The compression in both data streams takes eight cycles to process a data 
pair, as does the decompression. The compression pipeline consists of the search 
stage and the registration stage. The search stage compares the contents of the LUT 
with the incoming data and then creates a match flag list, and the registration stage 
updates the corresponding entry in the table according to the match flag list. The 
decompression pipeline consists of the same stages but is organized in the opposite 
direction. 


16.4.2.2 Performance Evaluations 


Here we discuss the performance effect of time-sharing multi-threading. The example 
structure with two data streams per module explained above is implemented on a 
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Table 16.2 Performance comparisons of the time-sharing multi-threading (TSM) 


Xilinx XCKUO25 | Frequency (MHz) | Combinational Registers RAM bits 
logic 

Compressor with | 342 5,120 

TSM 

Compressor 0 

without TSM 

Decompressor 17,408 

with TSM 

Decompressor 4,096 

without TSM 


Xilinx Kintex UltraScale FRGA XCKU025-FFVA1156-1-C, and Table 16.2 shows 
the comparisons. Compared with the clock frequency without time-sharing multi- 
threading, that with the optimization increases by a factor of approximately 1.23 
for compression and 1.08 for decompression, meaning that the total throughputs of 
the compressor and decompressor are increased by the same corresponding factors. 
However, the improvement is shared by the two data streams, so a single data stream 
achieves approximately 62% of the total throughput without time-sharing multi- 
threading for compression and 54% for decompression. Regarding the resource usage 
given in Table 16.2, the optimization reduces the combinational logic by 23-65%, 
the registers in the compressor module are increased by approximately 32%, and the 
number of registers is reduced to a third of that for the implementation without the 
optimization. 


16.5 Related Works and Literatures 


The most important lossless-compression algorithm is LZW, which is simple and 
effective and can be found in lossless-compression software such as gz, bzip2, rar, 
and Izh. However, when attempting to implement a compressor on hardware, the 
problems discussed in this chapter inevitably arise. To implement compact hardware 
for LZW, we must prepare memory of the order of kilobytes. For example, Fowers 
et al. [3] and Kim et al. [5] solved the problem regarding the longest matching by 
parallelizing the operations. However, it is impossible to increase the size of the 
sliding dictionary because the number of start indices increases with the length of 
the symbols. Another important research topic is how to manage the symbol LUT in 
a limited memory space. 

The field of machine learning contains well-known algorithms such as lossy count- 
ing [9] and the space saving [13]. However, these algorithms use operations based 
on pointers and are implemented in software. For a data stream with k different 
symbols, an attractive algorithm for frequency counting has been proposed in which 
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the їор-ӨК frequent items are counted exactly within O(1/6) space [4] for any con- 
stant 0 < Ө < 1. However, this also provides a software solution. Various hardware 
implementations of lossless data-compression techniques have been investigated in 
this decade, and a well-known approach is arithmetic coding (here in short, AC) [6], 
which is used widely to compress multimedia data. Arithmetic coding includes heavy 
computation with floating-point numbers to achieve high compression ratios. To 
avoid floating-point calculations, arithmetic coding based on binary numbers has 
been proposed [1, 7, 15]. However, it is not possible to avoid the potential fractal 
computation, which is why hardware implementations such as those by Pande et al. 
[16] and Mitchell et al. [14] have been proposed to accelerate the computing speed. 
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